【Stable Diffusion】SVD Forge使用详解 | 图像生成视频 | Stable Forge

Canada Jasmine Studio
16 Mar 202405:34

TLDR本视频介绍了如何使用Stable Diffusion的SVD功能生成视频。首先,需要下载并安装Forge,然后在civitai下载SVD模型并放置于指定目录。选择模型后,设置视频参数,如帧数、帧率和运动强度等。视频生成过程耗时较长,但效果令人满意。最终,视频和帧图像将保存在Forge的output目录下。

Takeaways

  • 📽️ Stable Video Diffusion (SVD) 是Stable Forge中的图像生成视频功能,没有文本生成视频功能。
  • 🔧 在SVD中使用的模型可以从Civitai下载,推荐使用基础模型,大小为4.45GB。
  • 🖼️ 输入一张图像并选择SVD模型,支持的图像尺寸为1024x576或576x1024,取决于图像的方向。
  • 🎞️ SVD提供两种视频生成模型:img2vid生成14帧,img2vid-xt生成25帧,视频帧率可以设置为6。
  • 🚶‍♂️ Motion Bucket ID代表运动强度,默认值为127,数值越大动作变化幅度越大。
  • 🔄 Augmentation Level参数越高,生成的视频与原始图像的相似度越低,默认设置为0。
  • 🔍 提示词引导系数通过min CFG和CFG线性增长,影响视频帧之间的变化,默认最小CFG为1.0。
  • ⏱️ 采样算法支持euler a,调度器使用Karras,用户可以根据需求调整。
  • 💻 生成的视频保存在Forge目录的webui/output目录中,支持查看生成的每一帧静态图像。
  • ⏳ 虽然官方建议8GB的显存,但4GB显存的设备也可以生成,只是速度较慢。

Q & A

  • 什么是SVD?

    -SVD是Stable Video Diffusion的缩写,是一种用于生成视频的技术。

  • 如何安装并使用SVD功能?

    -安装好Forge后,它已经内置了SVD的功能。在Forge中,你可以通过选择SVD checkpoint的模型来使用SVD功能。

  • 在哪里可以下载SVD模型?

    -SVD模型可以在Civitai的Models部分下载,选择SVD后,按照checkpoint过滤,可以找到不同的SVD模型。

  • 下载的SVD模型应该放在哪里?

    -下载的模型文件应该放在Forge的主目录下的webui目录下的models目录下的svd目录中。

  • SVD支持哪些视频分辨率?

    -SVD支持两种分辨率:1024x576和576x1024,根据输入图像的横幅或竖幅选择适合的分辨率。

  • SVD能生成多少帧视频?

    -SVD可以生成的视频帧数取决于使用的模型,例如img2vid模型生成14帧,img2vid-xt模型可以生成25帧。

  • 什么是FPS,它在SVD中有什么作用?

    -FPS是帧率,它决定了视频每秒的帧数。例如,如果FPS设置为6,则表示1秒内生成6帧视频。

  • Motion bucket ID在SVD中代表什么?

    -Motion bucket ID代表视频中的运动强度,值越高,动作变化幅度越大,范围是从0到1023,默认值是127。

  • Augmentation level参数在SVD中有什么作用?

    -Augmentation level参数影响生成视频与原始图像的相似度,值越高,视频与原图差异越大;值越低,相似度越高。

  • 迭代步数在SVD中有什么作用?

    -迭代步数是算法中生成每一帧视频时使用的迭代次数,它影响视频生成的质量和速度,默认值是20。

  • 如何理解提示词引导系数CFG在SVD中的作用?

    -提示词引导系数CFG在算法中是一个线性增长的关系,从min CFG值线性增长到最大CFG值,影响视频生成的变化程度。

  • 生成的视频和每一帧的图像存放在哪里?

    -生成的视频和每一帧的图像存放在Forge目录下的webui目录下的output目录中的SVD文件夹内。

Outlines

00:00

🎥 Introduction to SVD and Model Setup

The speaker begins by introducing Stable Video Diffusion (SVD), a feature included with Forge upon installation. They explain that Forge only has video generation from images, not text. The user is guided to download a SVD checkpoint model from CIVIT.AI and place it in the specified directory within Forge. The video discusses model options, resolution settings (1024x576 or 576x1024), frame choices (14 or 25 frames), frame rate (FPS), motion intensity (motion bucket ID), and augmentation level. The speaker also explains the iterative process and CFG parameters for guiding the video generation. The video concludes with a demonstration of video generation, noting the time it takes and the requirement of 8GB VRAM, though the speaker managed with 4GB.

05:02

🖥️ Accessing Generated Video Frames

The second paragraph describes how to access the generated video frames. The user is instructed to copy the image address from the interface and paste it into a file manager to locate the directory where the static frames are stored. The directory path is explained, and it is noted that all generated images are saved in a specific folder within the /temp/gradio directory. The speaker thanks the viewers for watching and hints at the next video in the series.

Mindmap

Keywords

💡Stable Video Diffusion (SVD)

Stable Video Diffusion (SVD) refers to a model within the realm of AI-generated content, specifically designed to create videos from static images. In the context of the video, SVD is a feature integrated within the Forge application, which allows for the generation of videos from images. The video script describes the process of using SVD by selecting a model checkpoint and generating frames to create a video sequence.

💡Forge

Forge is mentioned as a software application that has SVD functionality built-in. It is used for generating videos from images and is noted for its capability to utilize SVD models. The script explains that within Forge, there is a feature for image-to-video generation but not for text-to-video generation, which might be available in other interfaces like comfyui.

💡Civitai

Civitai is identified as a platform where various AI models can be downloaded, including those for SVD. The script instructs viewers to download the SVD model from Civitai, specifically from the Models section and then filter by 'SVD' to find the appropriate model for video generation.

💡Checkpoint

In the video script, a checkpoint refers to a saved model file used in machine learning, which is necessary for the SVD process. The model checkpoint is downloaded from Civitai and placed in a specific directory within the Forge application to enable video generation.

💡Width and Height

These terms are used to describe the dimensions of the video to be generated. The script specifies two options for width and height: 1024x576 or 576x1024. The choice depends on the orientation of the input image, with landscape images typically using 1024x576.

💡Video Frames

Video frames refer to the individual images that make up a video. The script mentions two models by OpenAI that determine the number of frames: 'img2vid' generates 14 frames, and 'img2vid-xt' can generate up to 25 frames, which is the model used in the script example.

💡FPS (Frames Per Second)

FPS indicates the frame rate of the video, which is crucial for determining the smoothness of the video playback. The script explains that an FPS of 6 means there are 6 frames in one second, and with 25 frames total, the video will be approximately 4 seconds long.

💡Motion Bucket ID

Motion Bucket ID is a parameter that controls the intensity of motion in the generated video. A higher value indicates greater motion variation. The script sets this value to 127 by default, which balances the motion intensity in the video.

💡Augmentation Level

This parameter affects how similar the generated video is to the original image. A higher augmentation level results in a video that is less similar to the input image, while a lower level increases the similarity. The script sets this to 0 for maximum similarity.

💡CFG (Control Flow Guidance)

CFG is a parameter that influences the guidance provided by the prompt words in the video generation process. The script discusses 'guidance min CFG' and 'CFG', explaining that they have a linear relationship, affecting the variation in the video from the initial to the final frames.

💡Sampling Algorithm

The sampling algorithm is part of the process that determines how new frames are generated in the video. The script mentions 'euler a' as an example of a sampling algorithm used in the video generation process.

💡Random Seed

A random seed is used in the generation process to ensure that the results are reproducible. The script does not go into detail about the random seed but implies its importance in the video generation process.

Highlights

SVD(Stable Video Diffusion)是Forge内置的功能,用于图像生成视频。

Forge中只有图生video的功能,没有文生video的功能。

ComfyUI可能支持文生video功能。

使用SVD前需要下载并放置模型文件。

模型文件可以在Civitai下载。

模型文件应放置在Forge主目录下的webui/models/svd目录。

模型下载后需要刷新列表才能显示。

视频生成支持两种分辨率:1024x576和576x1024。

视频帧数有两种选择:生成14帧或25帧。

FPS(帧率)影响视频播放速度。

Motion bucket ID控制视频动作变化的强度。

Augmentation level影响视频与原图的相似度。

迭代步数是算法生成每一帧时使用的参数。

CFG值控制视频帧之间的变化程度。

Sampling denoise是采样去噪参数,但具体作用未详述。

Sampler name是使用的采样算法。

调度器推荐使用Karras。

随机数种子可以影响视频生成的结果。

视频生成需要至少8G的VRAM,但4G的VRAM也能运行,只是时间较长。

生成的视频和每一帧的图像可以下载。

所有生成的视频和图像保存在Forge目录下的webui/output/SVD目录。