MG²: Melody Is All You Need For Music Generation
We present the Melody Guided Music Generation (MG²) model, the first novel approach using melody to guide the music generation that, despite a pretty simple method and extremely limited resources, achieves excellent performance.
Anyone can use this model to generate personalized background music for their short videos on platforms like TikTok, YouTube Shorts, and Meta Reels. Additionally, it is very cost-effective to fine-tune the model with your own private music dataset.
Note: While the online service is not yet stable, please be patient. We are doing our best to optimize it, including improving the stability of the service and refining the model checkpoint to ensure better performance.
Model Structure
Main Results
Here, we present the two main results of our experiments (i.e., text-to-music generation and multi-modality alignment), which demonstrate the effectiveness of MG² using objective metrics.
Demo
Tabel 1: Examples of prompts randomly generated by ChatGPT-4o
Prompt | MG² | AudioLDM2-music | Mustango |
---|---|---|---|
1. Reggae song with a laid-back groove and uplifting, positive lyrics. | |||
2. Dark techno track with deep basslines and hypnotic, repetitive beats. | |||
3. Lo-fi hip-hop beat with mellow piano chords and a relaxed, jazzy vibe. | |||
4. Violin performance with a sentimental and nostalgic atmosphere. | |||
5. Hip-hop beat with deep bass and sharp electronic samples. | |||
6. Orchestral composition with sweeping strings and dramatic crescendos. | |||
7. Funky groove with slap bass, tight drums, and soulful brass sections. | |||
8. Generate a high quality, smooth, and slow paced piece of classical music. |
Tabel 2: Examples of prompts in MusicCaps
Prompt | MG² | AudioLDM2-music | Mustango |
---|---|---|---|
1. This is a funk/disco music piece. There is a male vocalist singing melodically in the lead. The melody is being played by the keyboard with the backing of the electric guitar and the bass guitar. The rhythm is being played by the acoustic drums. The atmosphere is groovy. This piece could be played at a retro-themed party at a dance club. | |||
2. This song contains a plucked string instrument playing a melody in the higher register along with an acoustic guitar strumming chords on the backbeat and an upright bass playing a simple melody along to the lead melody. This song may be playing in a video-presentation. | |||
3. This composition contains an upright bass playing softly along to a harp and strings playing a melody while a male deep voice is softly singing a melody sounding like telling a story. The song sounds like it was made for Christmas. This song may be playing at home having dinner with the whole family. | |||
4. This audio contains someone playing a piece on cello ranging from the low register up into the higher register. This song may be playing during a live performance. | |||
5. The song is an instrumental. The song is slow tempo with gentle drum brushes, percussion, bass guitar solo, and piano accompaniment gently. The song is groovy and emotional. The song is possibly a Christian worship song or a smooth jazz song. The audio quality is average. | |||
6. The Electro Pop song features a flat female vocal, occasionally supported by wide background female doubling vocals, singing over quiet drums, groovy and boomy bass, arpeggiated synth melody and some sound effects of the airplane and the explosion. In the second part, the drums cut through the mix more, therefore they are more audible, while the new elements appear, including shimmering bells and simple hi hats. Sounds like a low quality recording, especially because of that first part of the loop. | |||
7. The high-quality, smooth, and clear quality recording features a live performance of a big band that consists of shimmering hi hats, steel pan melody, punchy snare, soft kick hits, bright brass melody, groovy double bass melody and cowbell percussion. The recording is a bit noisy and it sounds happy and fun. |
Quetionaire
Tabel 3: Guess which is/are genererated by MG²(Choose none, or one or multiple clips),and others are clips from current songs.
1. | 2. | 3. | 4. | 5. |
Tabel 4: The music clips that are used in our human evaluation (All of them are generated by MG²)
Prompt | Audio |
---|---|
1. A synthwave music track filled with the retro vibes of the 1980s, featuring powerful, punchy synth tones and a strong, rhythmic beat. | |
2. A piece of music that begins with a gentle piano solo, gradually building in intensity over time. | |
3. A slow and gentle piano piece that evokes a sense of calm and relaxation. | |
4. A piece of music where a flowing violin melody intertwines seamlessly with soft orchestral tones. | |
5. A relaxed hip-hop beat accompanied by soft piano chords, creating a laid-back jazz atmosphere. | |
6. A piece of music featuring a gentle saxophone melody accompanied by a soft background, perfect for serene moments. | |
7. A piece of music with a smooth, jazzy trumpet improvisation, blended with a light drum rhythm, ideal for late-night listening. |