The final result that the user receives from the output of the note detection system is three printed music sheets. One output corresponds to the AMDF function output. This is the most accurate. The other two correspond to the classifiers' output. The complex tree classifier is the more accurate classifier while the K nearest neighbor classifier is slightly less accurate.
We are able to identify the correct notes with a reasonable high accuracy. For example, a piano chromatic scale results in 100% accuracy with AMDF, 96% accuracy with complex tree classifier, and 68% accuracy with K nearest neighbor classifier.
The observed failures or inaccurate detections are mostly notes that are plotted on a higher scale than they actually are. This was a probable result, as the confussion matrices generated by the classifiers showed this fact. There was a higher probability of error to detect some notes from the 4th octave into the 5th octave.
All the detection is done by means of main frequencies and first harmonic of each note. The input of the system is a music recording, which we split in time domain to have all its notes in separate vectors that we can analyze individually. Once we apply all our code (different methods and algorithms) the result we get is a vector that idicates the main frequency of each note in the order they appear on the recording. Then, we convert that vector into MIDI values (music notes' frequency codification) and we use those numbers to plot each note on the staff.
When the recording was a piano melody, we were able to obtain and plot almost all the notes' names, and consequently their corresponding sheet music representation with a similar accuracy as for the chromatic scale.
We are able to detect that a single note is being played for a longer period of time. However, when printing longer duration notes, the note printing program prints a number of short duration notes corresponding to the length of the single long note.
If the input file to the program is a noisy recording, such as the "Much Noise Piano" file, which is a piano chromatic scale with people talking on the background, the results turn out to be surprising.
It is not possible to calculate the percentage of correct notes, because now we get a vector of a longer duration than we expected, as some "extra" notes introduced by the remaining noise are added to the staff. However, a simple observation is enough to state that this time, the AMDF is not the most accurate algorithm. We can observe that the printed notes have an ascending order, but are not the correct ones. There are some notes missing, even if the segmentation algorithm implemented before the AMDF function in the code detected the correct peaks and splitted the signal properly, and there are some added notes due to some of teh noise components.
However, the classifiers worked better this time. We are able to see that the Complex Tree has the highest accuracy. It detects almost all the notes, even if the ones in the beginning are just due to the noise. We can appreciate the ascending scale with most of the notes on it. We keep getting the failures or errors with the octaves (4th detetcted as 5th sometimes) but the name of the note is the appropiate one.
The KNN Classifier also provides more accurate results than the AMDF algorithm, but they are not as good as the Complex Tree Classifier.
Interpretation
AMDF, a thoroughly developed external tool, has a very high accuracy with signals without noise, due to its capability to take in and properly interpret a large amount of data. While we can only reasonably provide two frequencies to the classifiers, the AMDF function can handle the entire frequency spectrum of the audio recording.
Although the K nearest neighbor has a high accuracy on the training data, this accuracy does not necessarily translate to real data, which has more variation due to the inconsistancy of instruments' tuning. For example, one piano may have an A4 of 450Hz while another may have an A4 of 430Hz when the ideal main frequency for A4 is 440Hz.
Because it makes decisions based off multiple parameters, the Complex tree classifier can out-perform the K nearest neighbor, which only considers a single parameter - distance. Although the K nearest neighbor has a higher performance in the training data, the complex tree classifier fits well with the harmonic nature of music signals.
The best option would be to implement the AMDF algorithm for really well denoised recordings and the Complex Tree Classifier for the noisiser files. We have to continue improving the three algorithms in order to get a method that is accurate for both noisy and denoised recordings.
What We Learned
We learned that there are many ways to approach a DSP problem. At first, we primarily considered using tools we learned in class, such as FFT and spectrograms, in order to detect notes in an audio recording. Although this method had some potential, we found that the effort required to solve this problem purely algorithmicly using these "basic" DSP tools would not be effieciently spent. After seeing our initial approach would not work as well as expected, we decided to consider new DSP tools to accomplish our goal. This led us to explore more advanced tools such as classifiers and AMDF. While working with classifiers, we learned how to create a suitable training data set for a machine learning program. We learned how to train a classifier and, once trained, use it to predict test samples. We also learned how to interpret various classifier data graphs such as scatter plots and confusion matrices. We selected several types of classifiers based on their performance and further researched each type. Overall, we learned a great deal about many different types of DSP tools and how to integrate them together.
Additionally, we learned a great deal about the properties of music/sound. Specifically, we found out that the notes that musical instruments play contain many frequencies due to harmonics, and that these harmonics can have greater amplitudes than the fundamental frequency of the sound. This property makes signal processing of musical notes more challenging than one might first think.