audio-ldm

Need a specific sound to use in your projects? Audio-LDM model can create any sound from a text prompt with customizable duration and quality levels. Great for filmmakers, video content creators and podcasters!

Hold: 0

Required: 5,000 $wRAI

21k runs

Input

text

Text prompt from which to generate audio

duration

Duration of the generated audio (in seconds). Higher duration may OOM.

guidance_scale

Guidance scale for the model. (Large scale -> better quality and relavancy to text; small scale -> better diversity)

random_seed

Random seed for the model (optional)

n_candidates

Return the best of n different candidate audios

Hold at least 5,000 wRAI to use this model

Multi AI platform is completely free, but most models are only accessible to wRAI token holders. If you have any questions, feel free to ask in our Telegram chat

Output

Download

Input

text

docking of two spacecraft

duration

5.0

random_seed

26260

n_candidates

guidance_scale

2.5

Output

Download

Input

text

stream high in the mountains

duration

5.0

random_seed

682546

n_candidates

guidance_scale

2.5

Output

Download

Readme

AudioLDM generates text-conditional sound effects, human speech, and music. It enables zero-shot text-guided audio style-transfer, inpainting, and super-resolution.

Tricks for Enhancing the Quality of Your Generated Audio

Try to use more adjectives to describe your sound. For example: "A man is speaking clearly and slowly in a large room" is better than "A man is speaking". This can help ensure AudioLDM understands what you want.
Try using different random seeds, which can sometimes affect the generation quality.
It's better to use general terms like 'man' or 'woman' instead of specific names for individuals or abstract objects that humans may not be familiar with.

Model Authors

Haohe Liu, Zehua Chen, Yi Yuan, Xinhao Mei, Xubo Liu, Danilo Mandic, Wenwu Wang, Mark D. Plumley