DataGenerationConfig

Generate synthetic data using a model for finetuning an LLM.

KEY TYPE Description
generationInstructions str Instructions for the data generation model.
descriptionCol str Name of the description column.
frequencyPenalty float Penalty for frequency of token appearance.
completionCol str Name of the output completion column.
documentationCharLimit int Character limit for documentation.
fewshotExamples int Number of fewshot examples used to prompt the model.
promptCol str Name of the input prompt column.
idCol str Name of the identifier column.
examplesPerTarget int Number of examples per target.
verifyResponse bool Whether to verify the response.
tokenBudget int Token budget for generation.
seed Optional[int] Seed for random number generation.
model str Model to use for data generation.
concurrency int Number of concurrent processes.
subsetSize Optional[int] Size of the subset to use for generation.
temperature float Sampling temperature for the model.
oversample bool Whether to oversample the data.