Design of an MP3 Analyzer
First let’s play catch-up with our WWF project. We have met all the original requirements and the biologists have reported sufficient accuracy in the test data. In this first build I expanded upon an open source MP3 decoder named MAD, and some open code that drives that library: madxlib. This block of code would originally take an MP3 file as input and spit out a WAV. The signal detection required no binary output, so the first thing was to remove final steps in the driver code loop that dumped the (huge) decoded data onto disk.
Some investigation showed that I could analyze the MP3 data before it gets synthesized into a raw PCM (Pulse Code Modulated) format. So we go ahead and blow away all that processing code. What is left to work with is a relatively simple data structure that raw MP3 data is shuttled into by MAD. A single frame of MP3 data is (after other manipulations) a collection of coefficients of an MDCT (Modified Discreet Cosine Transformation) applied to the original PCM data. This has moved the data from the time domain to the frequency domain, however all the values are still relative amplitudes. The many coefficients used store sound data over its frequencies, but since there is no worry about discerning frequencies we can just scan them all. Fantastic! So, iterate all the coefficients of a frame and if any of those values lie between the thresholds we’ve set, we have a confirmed pulse.
Following the initial success of the first build, the WWF have stepped up their request. So far, I’ve delivered an app that opens up an MP3, processes it, and dumps out a flat data file. There is virtually no GUI. New requirements include data entry and several reports on the raw data. Users must enter the “animals” or the RFID tags being scanned for, how long they’re scanned, and the scanning order. The tag data has to be correlated with the raw data in the reports. The threshold signal strengths will be user configurable, and scan session meta-data will also be entered and stored.
From top to bottom this new application will be like a trip back in time. I’m using a C# GUI (yes it’s my first one, but don’t worry I’m a quick study), which references a C++ DLL, which wraps a C library.

I’m tying it all together with the .NET 2005 framework, so in the process of coding this I’ve been using a lot of its new functionality. One of these toys is the extended type-set DataSet with “Drag Once Databinding”. I figure this gem shaved about ten hours of coding off this project. Plus working with MS Access doesn’t easily allow for the use of stored procedures, and I’ll be damned if I ever write another line dynamic SQL. Here is a picture of some peccaries the Areas Project group sent me:
Some investigation showed that I could analyze the MP3 data before it gets synthesized into a raw PCM (Pulse Code Modulated) format. So we go ahead and blow away all that processing code. What is left to work with is a relatively simple data structure that raw MP3 data is shuttled into by MAD. A single frame of MP3 data is (after other manipulations) a collection of coefficients of an MDCT (Modified Discreet Cosine Transformation) applied to the original PCM data. This has moved the data from the time domain to the frequency domain, however all the values are still relative amplitudes. The many coefficients used store sound data over its frequencies, but since there is no worry about discerning frequencies we can just scan them all. Fantastic! So, iterate all the coefficients of a frame and if any of those values lie between the thresholds we’ve set, we have a confirmed pulse.
Following the initial success of the first build, the WWF have stepped up their request. So far, I’ve delivered an app that opens up an MP3, processes it, and dumps out a flat data file. There is virtually no GUI. New requirements include data entry and several reports on the raw data. Users must enter the “animals” or the RFID tags being scanned for, how long they’re scanned, and the scanning order. The tag data has to be correlated with the raw data in the reports. The threshold signal strengths will be user configurable, and scan session meta-data will also be entered and stored.
From top to bottom this new application will be like a trip back in time. I’m using a C# GUI (yes it’s my first one, but don’t worry I’m a quick study), which references a C++ DLL, which wraps a C library.

I’m tying it all together with the .NET 2005 framework, so in the process of coding this I’ve been using a lot of its new functionality. One of these toys is the extended type-set DataSet with “Drag Once Databinding”. I figure this gem shaved about ten hours of coding off this project. Plus working with MS Access doesn’t easily allow for the use of stored procedures, and I’ll be damned if I ever write another line dynamic SQL. Here is a picture of some peccaries the Areas Project group sent me:
0 Comments:
Post a Comment
Links to this post:
Create a Link
<< Home