audio-ldm

Need a specific sound to use in your projects? Audio-LDM model can create any sound from a text prompt with customizable duration and quality levels. Great for filmmakers, video content creators and podcasters!

Hold: 0
Required: 5,000 $wRAI
21k runs
Demo
Examples
Input
text
Text prompt from which to generate audio
duration
Duration of the generated audio (in seconds). Higher duration may OOM.
guidance_scale
Guidance scale for the model. (Large scale -> better quality and relavancy to text; small scale -> better diversity)
random_seed
Random seed for the model (optional)
n_candidates
Return the best of n different candidate audios
Hold at least 5,000 wRAI to use this model
Multi AI platform is completely free, but most models are only accessible to wRAI token holders. If you have any questions, feel free to ask in our Telegram chat
Input
text
docking of two spacecraft
duration
5.0
random_seed
26260
n_candidates
3
guidance_scale
2.5
Input
text
stream high in the mountains
duration
5.0
random_seed
682546
n_candidates
3
guidance_scale
2.5

Readme

AudioLDM generates text-conditional sound effects, human speech, and music. It enables zero-shot text-guided audio style-transfer, inpainting, and super-resolution.

Tricks for Enhancing the Quality of Your Generated Audio

  • Try to use more adjectives to describe your sound. For example: "A man is speaking clearly and slowly in a large room" is better than "A man is speaking". This can help ensure AudioLDM understands what you want.
  • Try using different random seeds, which can sometimes affect the generation quality.
  • It's better to use general terms like 'man' or 'woman' instead of specific names for individuals or abstract objects that humans may not be familiar with.

Model Authors

Haohe Liu, Zehua Chen, Yi Yuan, Xinhao Mei, Xubo Liu, Danilo Mandic, Wenwu Wang, Mark D. Plumley