Music Generation With PyTorch
LSTM Music Generator Documentation
Table of Contents
- Overview
- Requirements
- Getting Started
- Code Explanation
- How It Works
- Limitations
- Potential Improvements
Overview
This program uses a Long Short-Term Memory (LSTM) neural network to learn patterns from MIDI music files and generate new musical sequences. The implementation includes:
- Loading and parsing MIDI files using music21 library
- Preprocessing musical notes into sequences for training
- An LSTM-based neural network architecture
- Training the model to predict the next note in a sequence
- Generating new music by sampling from the model's predictions
Requirements
To run this program, you'll need:
- Python 3.6+
- Required Python packages:
torch
(PyTorch)numpy
music21
glob
pickle
- MIDI files for training (place them in the same directory as the script)
- Optional: CUDA-enabled GPU for faster training (PyTorch will automatically use GPU if available)
pip install torch numpy music21
Getting Started
- Place your MIDI files in the same directory as the script (or specify the path)
- Run the script:
python music_generator.py
- The script will:
- Load and process the MIDI files
- Train the LSTM model
- Generate a new MIDI file called
generated_music.mid
- Open the generated MIDI file with any music player or DAW
- The quantity and quality of training MIDI files
- The training parameters (epochs, sequence length, etc.)
- The complexity of the musical patterns in the training data
Code Explanation
Configuration
SEQUENCE_LENGTH = 100 # Length of input sequences BATCH_SIZE = 64 # Number of sequences per batch EPOCHS = 50 # Number of training epochs HIDDEN_SIZE = 256 # Size of LSTM hidden layers NUM_LAYERS = 2 # Number of LSTM layers LEARNING_RATE = 1e-3 # Learning rate for optimizer DROPOUT = 0.3 # Dropout rate for regularization DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
These configuration parameters control the training process and model architecture. You can adjust them based on your needs:
- Increase SEQUENCE_LENGTH to capture longer musical patterns
- Increase HIDDEN_SIZE and NUM_LAYERS for a more complex model (requires more data and computation)
- Adjust LEARNING_RATE if training is unstable or too slow
- Increase EPOCHS for better training (but watch for overfitting)
Model Architecture
class MusicGenerator(nn.Module): def __init__(self, input_size, hidden_size, output_size, num_layers, dropout): super(MusicGenerator, self).__init__() self.hidden_size = hidden_size self.num_layers = num_layers self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True, dropout=dropout) self.fc = nn.Linear(hidden_size, output_size) def forward(self, x, hidden): out, hidden = self.lstm(x, hidden) out = out[:, -1, :] out = self.fc(out) return out, hidden def init_hidden(self, batch_size): return (torch.zeros(self.num_layers, batch_size, self.hidden_size, device=DEVICE), torch.zeros(self.num_layers, batch_size, self.hidden_size, device=DEVICE))
The model consists of:
- LSTM layers: Process sequential data and maintain hidden state
- Fully connected layer: Maps LSTM output to prediction probabilities
- Hidden state initialization: Provides starting state for LSTM
The forward pass takes an input sequence and hidden state, processes it through the LSTM, and returns predictions for the next note.
Data Preparation
Loading Notes
def load_notes(midi_path="*.mid"): notes = [] files = glob.glob(midi_path) for file in files: midi = converter.parse(file) try: parts = instrument.partitionByInstrument(midi) elements = parts.parts[0].recurse() except: elements = midi.flat.notes for element in elements: if isinstance(element, note.Note): notes.append(str(element.pitch)) elif isinstance(element, chord.Chord): notes.append('.'.join(str(n) for n in element.normalOrder)) return notes
This function:
- Finds all MIDI files in the specified path
- Parses each file using music21
- Extracts notes and chords (chords are represented as dot-separated note values)
- Returns a list of all notes/chords in sequence
Preparing Sequences
def prepare_sequences(notes): pitchnames = sorted(set(notes)) note_to_int = {n: i for i, n in enumerate(pitchnames)} n_vocab = len(pitchnames) network_input = [] network_output = [] for i in range(len(notes) - SEQUENCE_LENGTH): seq_in = notes[i:i + SEQUENCE_LENGTH] seq_out = notes[i + SEQUENCE_LENGTH] network_input.append([note_to_int[n] for n in seq_in]) network_output.append(note_to_int[seq_out]) n_patterns = len(network_input) network_input = np.array(network_input).reshape(n_patterns, SEQUENCE_LENGTH, 1) / float(n_vocab) network_output = np.array(network_output) return network_input, network_output, note_to_int
This function:
- Creates a vocabulary of unique notes/chords
- Maps each note/chord to an integer
- Creates input sequences of SEQUENCE_LENGTH and corresponding output (next note)
- Normalizes input between 0 and 1
Training Process
def train_network(): notes = load_notes() network_input, network_output, note_to_int = prepare_sequences(notes) n_vocab = len(note_to_int) X = torch.from_numpy(network_input).float().to(DEVICE) y = torch.from_numpy(network_output).long().to(DEVICE) model = MusicGenerator(input_size=1, hidden_size=HIDDEN_SIZE, output_size=n_vocab, num_layers=NUM_LAYERS, dropout=DROPOUT).to(DEVICE) criterion = nn.CrossEntropyLoss() optimizer = optim.Adam(model.parameters(), lr=LEARNING_RATE) n_batches = (len(X) // BATCH_SIZE) model.train() for epoch in range(1, EPOCHS+1): epoch_loss = 0.0 hidden = model.init_hidden(BATCH_SIZE) for b in range(n_batches): start = b * BATCH_SIZE end = start + BATCH_SIZE inputs = X[start:end] targets = y[start:end] optimizer.zero_grad() outputs, hidden = model(inputs, hidden) hidden = (hidden[0].detach(), hidden[1].detach()) loss = criterion(outputs, targets) loss.backward() optimizer.step() epoch_loss += loss.item() avg_loss = epoch_loss / n_batches if n_batches else 0 print(f"Epoch {epoch}/{EPOCHS} Loss: {avg_loss:.4f}") torch.save(model.state_dict(), 'music_generator.pth') with open('note_to_int.pickle', 'wb') as f: pickle.dump(note_to_int, f)
The training process:
- Loads and prepares the data
- Initializes the model, loss function, and optimizer
- Trains in batches for the specified number of epochs
- Saves the trained model and note-to-int mapping
Music Generation
def generate_music(model_path='music_generator.pth', note_dict_path='note_to_int.pickle', gen_length=500): with open(note_dict_path, 'rb') as f: note_to_int = pickle.load(f) int_to_note = {i: n for n, i in note_to_int.items()} model = MusicGenerator(input_size=1, hidden_size=HIDDEN_SIZE, output_size=len(note_to_int), num_layers=NUM_LAYERS, dropout=0).to(DEVICE) model.load_state_dict(torch.load(model_path, map_location=DEVICE)) model.eval() notes = load_notes() network_input, _, _ = prepare_sequences(notes) start_idx = np.random.randint(0, len(network_input)) pattern = list((network_input[start_idx] * len(note_to_int)).astype(int).flatten()) generated = [] hidden = model.init_hidden(1) for _ in range(gen_length): seq = np.array(pattern[-SEQUENCE_LENGTH:]).reshape(1, SEQUENCE_LENGTH, 1) / float(len(note_to_int)) seq_tensor = torch.from_numpy(seq).float().to(DEVICE) with torch.no_grad(): output, hidden = model(seq_tensor, hidden) hidden = (hidden[0].detach(), hidden[1].detach()) probs = nn.functional.softmax(output.view(-1), dim=0).cpu().numpy() index = np.random.choice(range(len(note_to_int)), p=probs) pattern.append(index) generated.append(int_to_note[index]) output_notes = [] for pattern in generated: if '.' in pattern: parts = pattern.split('.') notes_in_chord = [note.Note(int(p)) for p in parts] for n in notes_in_chord: n.storedInstrument = instrument.Piano() new_chord = chord.Chord(notes_in_chord) output_notes.append(new_chord) else: new_note = note.Note(pattern) new_note.storedInstrument = instrument.Piano() output_notes.append(new_note) midi_stream = stream.Stream(output_notes) midi_stream.write('midi', fp='generated_music.mid')
The generation process:
- Loads the trained model and note mappings
- Selects a random starting sequence from the training data
- Generates new notes one at a time by:
- Feeding the current sequence through the model
- Sampling from the output probabilities
- Adding the new note to the sequence
- Converts the generated notes back to MIDI format
- Saves the result as a MIDI file
How It Works
The program works by learning statistical patterns in sequences of musical notes:
- Pattern Recognition: The LSTM learns which notes/chords tend to follow other notes/chords
- Sequence Prediction: Given a sequence of notes, the model predicts probabilities for the next note
- Creative Generation: By sampling from these probabilities and feeding predictions back as input, the model generates new sequences
This approach is similar to how language models generate text, but applied to musical notes instead of words.
Limitations
- Simple representation: Only captures pitch information (no duration, velocity, etc.)
- Short-term patterns: Limited by the SEQUENCE_LENGTH parameter
- Quality depends on training data: Needs diverse, high-quality MIDI files
- No musical structure: Doesn't explicitly model musical form (verse, chorus, etc.)
Potential Improvements
- Add timing information: Include note durations in the model
- Multi-track generation: Model different instruments/voices
- Transformer architecture: Replace LSTM with a more modern architecture
- Conditional generation: Generate in specific styles or keys
- Post-processing: Apply music theory rules to improve results
Comments