Lora Training using only ComfyUI!!

AIFuzz
27 Feb 202411:14

TLDRIn this video, Marcus introduces a new ComfyUI feature that allows users to fully train LoRAs (Low-Rank Adaptation models) directly within the ComfyUI interface, without needing external tools like Google Colab or Kaggle. He explains how to create a dataset, prepare text captions for images, and configure nodes within ComfyUI to complete the training process. The video walks through the entire workflow, from dataset preparation to generating and saving the final LoRA model, making it a simple and accessible method for AI training.

Takeaways

  • πŸš€ ComfyUI now allows users to fully train LoRAs without relying on other platforms like Kohya, Kaggle, or Google Colab.
  • πŸ“‚ A dataset of images is required for training. The images should be placed in a specific folder structure, named appropriately.
  • πŸ–ΌοΈ For this demonstration, the dataset consists of 24 sketches, but any number of images can be used for training.
  • πŸ’Ύ The folder where the images are stored must be named correctly for the ComfyUI node to work properly.
  • πŸ“‘ A special node is used to create caption text files for each image in the dataset, describing the content of the images.
  • πŸ”§ It's essential to use a specific version of ComfyUI (Scorch CU 121) for this process to work correctly.
  • πŸ› οΈ The Lora training process involves setting various parameters such as checkpoint name, image path, batch size, and training epochs.
  • ⏳ The training process saves LoRA outputs after each batch of images (usually 10 images at a time).
  • 🎨 Once the training is done, you can use the trained LoRA model for your specific image generation needs.
  • πŸ“€ The entire process is done within ComfyUI, making it convenient and efficient for training without external tools.

Q & A

  • What is the main topic of the video?

    -The video is about training LoRAs (Low-Rank Adaptation models) using only ComfyUI, without needing third-party tools like Kohya, Kaggle, or Google Colab.

  • What is the major benefit of the new node in ComfyUI?

    -The major benefit is that it allows users to fully train LoRAs directly within ComfyUI, simplifying the process by eliminating the need for additional tools or platforms.

  • How do you start training a LoRA in ComfyUI?

    -To start training, you need a dataset of images, preferably between 25 to 50. The images are placed in a folder named following a specific structure, and then caption files are generated using the Lora caption node.

  • What are the requirements for the dataset images?

    -The dataset images should ideally be in PNG format, and they don't need to be the same size. The folder containing these images must be named according to the specific requirements of the node.

  • What is the purpose of generating text captions for each image?

    -The text captions provide a description of what's in each image, which helps the model understand the content better during training.

  • What specific version of ComfyUI is needed for LoRA training?

    -You need a specific version of Scorch CU 121 to work with the node for LoRA training. Using other versions might cause errors.

  • How do you generate text captions for the dataset in ComfyUI?

    -You can use the 'Lora caption save' and 'Lora caption load' nodes along with the WD 14 tagger, which generates captions for each image in the dataset.

  • What are the key parameters to configure in the LoRA training node?

    -Key parameters include the base checkpoint, path to the images, batch size, max training epochs, output name, and output directory. These parameters define how the training process will proceed.

  • How does the training process work in terms of saving checkpoints?

    -The training process saves a checkpoint after each epoch. For example, if there are 50 images and the training is set to save after every 10 epochs, the model will save checkpoints at intervals of 10, 20, 30, etc., until it completes 50 epochs.

  • What are the outputs of the LoRA training process?

    -The output is a trained LoRA file, which is saved in the specified output directory. This file can be used to apply the trained model directly in ComfyUI for generating images.

Outlines

00:00

πŸš€ Introducing Full Lora Training in ComfyUI

The presenter, Marcus, announces a major update for ComfyUI users, where they can now train fully trained Loras exclusively within ComfyUI without the need for external tools like Coya, Kaggle, or Google Colab. He shares a link to the GitHub for the node, 'Allora Trading' by Larry Jane. Marcus walks through how to start by creating a dataset of images (PNG format, ideally between 25-50 images) and storing them in a folder labeled 'folder 5.' The correct folder structure is essential, and Marcus emphasizes using a fresh install of ComfyUI with the required Scorch CU 121 version to avoid errors.

05:00

πŸ–ΌοΈ Creating Text Captions for Images

Marcus explains how to generate text captions for the image dataset, using nodes from the GitHub. He demonstrates the process, mentioning the Lora caption save/load nodes and the W-14 Tagger tool that helps analyze each image to create corresponding text files. These captions give the AI context about the images during training. After generating the captions, text files are created for each image in the dataset folder. Marcus highlights the importance of correctly linking paths and settings, including using the name list to prompt the system.

10:03

πŸ”§ Setting Up the Magic Node for Training

This section delves into configuring the magic node for Lora training in ComfyUI. Marcus highlights that the magic node allows training directly in ComfyUI. He explains the various settings for the node, such as specifying the checkpoint, image paths, batch size, and number of training epochs. He also shares that the system will save a Lora model at each epoch and how to ensure the system identifies the correct folder. Marcus advises using a batch size of 1 and training on 50 epochs, with Loras saved at every 10-image interval.

🎨 Training and Using the Lora Model

Once the setup is complete, Marcus walks through the process of training the Lora model. He keeps an eye on the training progress by monitoring the steps and explains how the system buckets the images during training. After the process, Marcus verifies that the training is complete, and a new Lora model is created. He tests the newly trained Lora with different prompts and applies it to sketch-style images, using only ComfyUI to handle the entire process without external tools. He concludes by showcasing examples of trained Loras and mentions that additional resources are available in the video's description.

πŸ‘‹ Conclusion and Final Remarks

Marcus wraps up the video by showing examples of images produced with previously trained Loras. He signs off, mentioning that his crew will be back with another AI fuzz video soon. The video ends with background music, leaving viewers with a preview of the Loras created prior to the recording.

Mindmap

Keywords

πŸ’‘ComfyUI

ComfyUI is a user interface designed for AI model training. In the video, it is highlighted that users can now fully train Lora models exclusively within ComfyUI without relying on external platforms like Kaggle or Google Colab.

πŸ’‘Lora

Lora refers to a model training technique in AI, often associated with reducing the size of models for better performance. In the video, the speaker demonstrates how to train Lora models entirely within ComfyUI, making the process more streamlined.

πŸ’‘Node

In this context, a 'Node' refers to a functional component in ComfyUI that helps users execute tasks such as training Lora models. The video explains that users can train Lora models using a specific node developed for ComfyUI.

πŸ’‘Data set

A data set refers to a collection of images that will be used to train AI models. In the video, the speaker demonstrates how to prepare a data set of sketches to train the model using ComfyUI.

πŸ’‘Text captions

Text captions are descriptions generated for each image in the data set to help the AI model understand the content. The video shows how to create text captions for images using a specific node in ComfyUI.

πŸ’‘Training Epoch

An epoch refers to one complete cycle through the entire data set during model training. In the video, the speaker sets up the training to save the Lora model at various epochs as it processes the images.

πŸ’‘Base checkpoint

A base checkpoint is the starting model that the user builds upon during training. In the video, the speaker explains the importance of selecting the correct base checkpoint to ensure that the Lora model trains effectively.

πŸ’‘Batch size

Batch size refers to the number of images processed at once during training. The video suggests setting the batch size to 1 for training Lora models in ComfyUI, emphasizing the impact of batch size on training efficiency.

πŸ’‘Clip skip

Clip skip is an option in AI training that allows skipping some layers of the model, potentially improving training speed. In the video, the speaker advises setting clip skip to 2 for optimal performance during Lora training.

πŸ’‘GitHub

GitHub is a platform for hosting and sharing code. In the video, the speaker provides a GitHub link to the node required for Lora training in ComfyUI, allowing others to access and use the necessary tools.

Highlights

You can now train Loras exclusively within ComfyUI using a single node, without the need for external tools like Google Colab.

The Lora training node for ComfyUI is available on GitHub, created by Larry Jane.

The training process starts by creating a dataset of images, which should be stored in a specially named folder.

You can train with as few as 24 images in the dataset for a demonstration, though 25 to 50 is typical.

The dataset images don't need to be the same size or format, but they should be placed in the correct directory structure.

Text captions are created for each image in the dataset to help the AI understand the content during training.

Ensure you use the correct version of ComfyUI (Scorch CU 121) to avoid issues with the training process.

The W14 Tagger node is used to analyze each image and generate text captions describing the content.

Training progress is saved at set intervals, allowing you to track the steps and the evolution of the Lora model.

The final output is a fully trained Lora model, saved automatically in the correct folder.

No external platforms like Kaggle or Colab were used in this process; everything was done in ComfyUI.

The node offers various customizable options such as checkpoint name, base checkpoint, and output directory.

You can train the Lora model in batches of 10 images, and the model is saved after each batch is processed.

At the end of the training, the Lora model is fully ready for use with your specific dataset and preferences.

ComfyUI has made it incredibly simple and streamlined to train Loras, making it accessible for more users.