The Digital Humanities at MIT combines technology (notably, AI) and the humanities to create deeper analyses of human thought and understand relationships between cultures and methods of expression. In Spring 2021, I worked with Mr. Garo Saraydarian and another MIT student to encode oud music from PDF images into Musescore (.musicxml or .xml) format, and then use the MIT-developed music21 Python module to analyze the music to perform various analysis tasks, mostly involving prediction.


Mr. Saraydarian's grandfather comes from Cyprus and Turkey, and in the early 20th century, owned two corpora of oud music. There are some notational differences between traditional Western and near-Eastern sheet music, and sharps and flats can raise or lower by 1/4 or even 1/8 instead of the usual 1/2-tone. The music uses an Arabic (Ottoman Turkish) script, as this was decades before the collapse of the Ottoman Empire and subsequent efforts to latinize Turkish.


The mystique of the project is evident from the interesting culture: the translations from another language and interpretation of Turkish notation. To Western eyes, the oud seems like a mix of violin and guitar: it looks more like the latter, and sounds more like the former. The music itself allows repetition of sections, allowing for spontaneity in how many times a chorus should be played.


The first part of the project was manually typing the .pdf images of sheet music, over 400 of them, from .pdf into Musescore—the richest library for processing and working with sheet music. While I'd like to say my typing speed of 140 wpm at the time1 helped increase efficiency, I also learned my fair share of MuseScore shortcuts that allowed better reduction of time.


The second part involved taking .xml files, extracting them for key signature, time signature, and other patterns (e.g. section names), and creating prediction tasks using music21. Some analyses included withholding notes from the model and seeing if it could correctly predict the next phrase or note, guessing the title of the piece (they were often named based on the rhythms and section names included) from sections of the piece, and intentionally creating note errors in the sheet music (e.g. changing an "A" to "C#", or turning a 1/4-sharp on a D to a normal D#) to see if the model could flag these errors.


The project culminated in a presentation with the Digital Humanities Laboratory at MIT. Our team's slide deck is below.



Footnotes

1…and I have not gotten significantly faster; I am still stuck at 145. If you have any tips, let me know!