Python Music Information Retrieval Reproducible Research (PyMIR³)

The field of music information retrieval (MIR) was lacking a framework for creating research that could be easily reproduced and expanded by other people. My friend Tiago, who was pursuing a PhD in the area at the time, identified this problem and started creating a framework to be easy to use. After some time, I got interested in the problem and helped to build the framework now called PyMIR³, available at GitHub.

Design objectives

We identified a few objectives that the framework should accomplish so that it could be actually be used by researchers on this area. Although there are other frameworks, they all had some kind of problem that, in our view, prevented them from being widely adopted. So the main objectives for the software were defined as:

  1. It should be easy to change. Researchers of MIR involve people that don’t have much programming experience and the learning curve should be as smooth as possible, allowing more people to contribute.
  2. It should be easy to use. Again, we should lower the bar for what people are required to know to make the best use of PyMIR³ we can provide. Although this two objectives may sound the same, we’ll show that they create two associated but different design decisions.
  3. It should be easy to reproduce results. After all, the name of the framework has “reproducible research”, so it’s a given. This imposes some interesting design choices that we will explain later.

Before entering into details regarding the modules, I must talk about the data structure shared among all modules, so that the I/O has some basic structure.

Data format

PyMIR³ provides many data formats (and the user is welcome to create his/her own if required) that are only able to perform computations regarding the data it stores in a generic fashion. For instance, the Spectrogram class can map a time instant to a time slice, but can’t compute the spectrogram itself.

This logical separation allows the same classes to be used with many modules without having an overwhelming number of methods that actually are data transformations and not accessors.

All classes are derived from BaseObject, which provides generic and optimized methods for saving and load any data in PyMIR³.

The save method is able to extract all NumPy matrices to save them separately from the rest of the data, since there are specialized functions for that. Each array is replaced by a placeholder for restoration to the right place when loading. Then the data structure is compressed before storing to reduce both file size and I/O time.

The load method is able to reverse what the save method does. However, there is no guarantee that the object type in the file is the same as the one we are trying to load, so the method takes care of checking it beforehand, allowing the user to be sure that the loaded data has the expected format.

Besides this save/load feature, every data object is composed of an actual data and a metadata, which will be described later.

Easy to change

Good software design is our friend here! Modularity makes it easy to add new blocks to the existing framework, so that the person only has to know about his/her block (besides possibly the blocks with which it interacts).

Each module in PyMIR³ is composed of a class with 4 basic methods (it can have more, but this is our basic recommendation):

  1. A method that returns a help string saying what the module does;
  2. A method that describes the arguments this specific module allows/requires;
  3. A method to perform the actual computation for the module;
  4. A method to glue command-line arguments to the computation.

Although the separation between (3) and (4) isn’t required, it allows the module to be called directly from within Python using the method defined in (3), which may speed-up the application as data doesn’t have to be loaded/stored at each step. Besides, it’s always nice to separate the logical components for maintenance purposes.

Right out of the gate, PyMIR³ has several common modules that can be used as template for the user to build their own, making it even easier to incorporate new commands. There are three main types of modules that PyMIR³ makes available for the user: debugging, computing modules and submodules.

Debugging modules

These kinds of modules are used to extract information about the data resulting from PyMIR³ computations. Some of them are specific to a kind of data type, like the one that just prints the spectrogram, but one is a major hack inside the data format described earlier. This allows the module to open any type of object and spill all data and metadata it contains, making it an easy to use and very powerful tool for debugging.

Computing modules

These are the modules that actually perform the computing inside PyMIR³. They process some file(s) given as input and saves the result to some file(s). Hence the developer doesn’t have to worry about how the blocks interact, since they should primarily do so through files.


These are just some helper modules that defines the hierarchy of modules, making it easier for a user to find the command it desires, as explained later.

Automatic module registration

Some readers may have noticed that I said that the module should have the 4 methods and must have a class. Nothing more. They might think “But hey, how do I register my module so it can be used?”. After all, frameworks usually (all others that I know of at least) require that the programmer must register the module. PyMIR³ doesn’t.

Besides the cool data format, this is one other thing that amazes me in PyMIR³: if i) your module can be imported by Python; ii) it is in the “modules” directory; and iii) its submodules hierarchy is correctly named (a folder named “my_modules” should have an associated submodule named “”. Just follow what is already in the framework as example), PyMIR³ will find it. I guarantee you this.

I’ve built a method that, each time PyMIR³ is called from the command line, it traverses the submodules tree and builds all the command-line arguments and structure for the modules. Hence new modules are found automatically and are ready to use.

Easy to use

Since the modules should communicate mainly through files (although the user is free to program in Python and skip the command-line interface), as explained earlier, a user just has to know how to run commands in the command-line, which is considerably easy. It shouldn’t be hard to implement a GUI for PyMIR³, but we don’t provide one (you are more than welcome to contribute!).

When calling PyMIR³, the user provides the hierarchical path of submodules until a leaf is reached and its arguments interpreted. This makes it easier for the user to know what to call and even more if s/he doesn’t! Want something doing with supervised learning? Just call with the supervised submodule and the help of all its modules are provided. This allows the user to explore the commands tree until a leaf is reached, which then provides the help for the leaf.

Since even command-line can be a little daunting, we provide lots of shell scripts showing complete examples of how to call PyMIR³ to perform some task in a data set.

Easy to reproduce

The fact that PyMIR³ is so modular and saves every intermediary results in files should make the results reproducible enough, but we went one step ahead and stored the whole computing path in the metadata.

When describing the data format, I didn’t talk specifically about the metadata because it fits better here. Usual metadata for something like a spectrogram would be the parameters (e.g. sampling rate) that the user passed to it, but we also store the original file name and its hash. If the input files are PyMIR³ data objects, we store their metadata too. We do this at every one of our modules (and user-created modules are encouraged to do the same!), so the metadata of a result object, such as an Evaluation, has the whole history of operations and data used to arrive at the result.

If someone tries to reproduce a research and doesn’t get the same results, s/he only has to verify where the metadata between his/her result and the original author’s result. This avoids the major issue of getting different results and having no idea which step to blame. As the whole history is there, even if a shell script that can be used to reproduce the research isn’t available (although this isn’t a nice thing to do), the sequence can be rebuilt.

Moreover, this is computed even if the user doesn’t use the command-line, since the modules can do this for data objects in memory too. Although this method doesn’t prevent one from faking a metadata history, it makes it easier for people to check that the results are valid, since even a fake metadata will provide a different result from the reproducing researcher at some point.


From our perspective, we’ve successfully designed a framework that deals with all our concerns. Tiago uses it frequently in his research and I’m thinking about taking somethings, like the DataObject and module auto-discovery, and creating libraries for them, as I think they have more general use.

If you are interested in PyMIR³, I’d recommend checking it on GitHub and giving it a try. We’d love to hear some feedback!