Automatic Transcription of Piano Music

Piano Music

Using advanced auditory signal processing techniques, researchers are now focusing on transcription of polyphonic piano music. AMT is generally performed by digital computers and operates on a frame by frame basis. The resulting transcriptions can be used to re-compose the original sounds, imitate them or simply reproduce them.

www.tartalover.com

There are two major approaches to AMT: one uses frame-by-frame analysis, the other combines results from separate frames. The latter method highlights useful information while the former lacks information about algorithm failures.

AMT requires an understanding of the relationship between signal models and auditory sensations. The signal model is defined as a set of discrete values, whereas the auditory sensations are related to the actions of particular musical gestures. In order to accurately model the signal behavior that triggers a specific sensation, a number of techniques are used.

Automatic Transcription of Piano Music

The most basic technique is the detection of individual notes. It is often used to detect notes in chords. The problem of multiple pitch estimation is addressed by probabilistic spectral smoothness principles. However, traditional hidden Markov model techniques require consideration of a large number of chord hypotheses. Similarly, the frequency components of a single frame are estimated with different techniques.

Using non-negative matrix factorizations and non-negative sparse coding, a spectrogram is decomposed into polyphonic notes. The signal model is then modelled by hidden Markov models to determine the magnitude of the piano signal segment. This technique can be complemented by a regularized minimization method to minimize the number of hidden Markov models. The resulting model can be used to detect the musical structures in the signal and predict the presence of notes.

Another approach to AMT is to perform analysis in the frequency domain and in the time domain. This approach uses a combination of vector classification and peak-picking techniques. The resulting signal model is represented as a linear combination of pre-stored piano waveforms. This technique can be applied to a number of different music genres. For example, the MIR-1K dataset contains recordings of chorales by J. S. Bach performed on a variety of instruments. This dataset is a large source of corresponding ground-truths.

Another technique to automatically identify polyphonic notes is non-negative sparse coding. This technique uses a regularized minimization method to reduce the number of hidden Markov models and to minimize the number of note candidates for each frame. A resulting signal model is also used to determine the frequency components of a single frame.