Introduction to AI Beat Creation: Part 1

Introduction to AI Beat Creation: Part 1

Kicking off my AI music project

This is part 1 of my blog series on fine-tuning Meta's MusicGen to create sample packs for Infamous Beats.

The series will cover data exploration, feature analysis, modeling, and experimentation until I have a working model.

I will explain my project with Infamous Beats through Daniel Bourke's ML and DS framework.

Conception


I wanted to make a real ML project that uses real data, helps real people, and is not a copycat of some tutorial.

Infamous Beats and I both saw creative potential in AI and I pitched building something for his business.

We both loved creating images with Stable Diffusion so I thought I could be his "AI image consultant".

I was thinking about generating videos for his YouTube channel and making song cover images.

Realistically, my 3060 Ti's video generation time is longer than my patience can endure.

This eventually led me to find Udio, and then I knew what I wanted to create:

A machine learning model that can generate music that resembles my friend's music and that he can use in for business.

Research


Flowchart depicting the steps in a full machine learning project: data collection, which leads to data modeling involving problem definition, data acquisition, evaluation, feature selection, modeling, and experiments. This is an iterative process that eventually leads to deployment.

Daniel Bourke's framework is how I will focus my efforts so that I won't constantly switch my approach.

Problem Definition


A machine learning model that can generate quality music or instrumentals that he can use for his business.

This led me to find a bunch of resources that could be useful for generating audio:

  1. AudioCraft by Meta

  2. MusicGen - Hugging Space

  3. TrainingDocs for AudioCraft

  4. Tensorflow Docs for MusicGen with an RNN

  5. TorchAudio Feature Extraction

  6. GANSynth: Making Music with GANs

  7. Model Zoo - WaveGAN Pytorch model

The sources I found gave me an idea of the work and the skills needed to get my initial results.

My project definition in ML terms would be...

Create a un-supervised machine learning model trained from an existing GAN model, fine-tuned with a dataset of 1046 example songs.

What Data do I have?


After a phone call with my project partner, we identified a good starting set of songs.

He has an online store that lets users buy music instrumentals for their songs.

His store is hosted on Airbit's marketplace, and he has +1000 songs listed for purchase.

You may be asking: "damn! 1000 songs, good luck downloading all of those."

Well yes, that would suck but I am lazy (some may say efficient) so I am going to automate it!

Automation!


I think web scraping is such a cool way to use programming. There are a lot of grey areas surrounding what is legal, generally accepted, or even just plain wrong.

The only area that I think web-scraping is used for evil is re-selling or scraping tickets/shoes/GPUs.

Come on man, be better.

I do think there are many cool and interesting ways you can use web scraping, even if you are not tech-savvy:

  1. Google Chrome extensions.

  2. Plenty of websites.

  3. Even software that will write the code for you.

My script to download the songs from Airbit utilizes Playwright's Python library.

A basic overview of the workflow that I have:

A flowchart depicting the steps of an Airbit scraper process:1. Start Airbit Scraper and log into Airbit.2. Scrape download link.3. Repeat the process.4. Generate a CSV file with links.5. Read links from the CSV file on an ML Desktop.6. Download songs into a dataset/folder.

  1. Use Playwright's browser library to log into Infamous Beat's account.

  2. Navigate to the store page.

  3. Find the download link for the top song in the table.

  4. Download each of the following links on the page.

  5. Go to the next page.

  6. Repeat.

My initial thought was I could do this all in Python. I technically can, but I found a method on Instagram that might be faster.

Using a Chrome extension called Insta Data Scraper I could scrape the songs from the website and have the extension handle all the nuances.

Instead of downloading each song, I logged into Airbit's account and then used Insta Data Scraper to scrape all the href links to a CSV file. Now my workflow looked a bit like this:

I know a lot is going on in this drawing but if you follow the steps it's not too complicated.

The download process took quite some time, but I wanted to make sure the website wouldn't kick me for requesting hundreds of downloads at a time.

After letting this run for 3 hours, I was able to get all of my audio data!

Evaluation & Features


InstaDataScraper was able to scrape more than just the song download links.

The CSV data has the cover art of the song, the title of the song, and the download links for every audio format offered by the music platform.

The additional data could be useful for future improvements, but not needed for a minimum viable product.

The evaluation metric I will be trying to improve is going to be pretty subjective.

Determining if a song is good or has the right vibe would be hard to quantify for a computer.

I was thinking of trying to use a platform like Prodigy, to let Infamous Beats annotate or give feedback to the samples created from the dataset.

After a conversation with ChatGPT, I figured there could be a better way.

A hybrid training architecture could involve:

  1. Fine-tune MusicGAN with the training subset of my data.

  2. Tweak parameters to get a decent quality of outputs.

  3. Listen to each sample and save the feedback into a new set of training data.

  4. Use a pre-trained GPT to tokenize the audio text feedback.

  5. Reinforce the feedback back into the model to generate better outputs.

This plan could make no sense but it seems like something that could be possible.

Next Steps


Everything past this point would be built on learning about ML architectures to create a model or build off of a pre-trained model.

Experimentation and tracking progress are vital moving forward once I start training and using these songs.

I will be writing more in-depth posts on the concepts I will be learning, tutorials on solving issues where I learned a lot, and publishing hugging spaces or Gradio apps that people can try out online.

Thank you for making it this far and see you next time!