Members of the Audionamix research team will attend the 11th annual International Conference on Latent Variable Analysis and Signal Processing (LVA/ICA) in Tel-Aviv, Israel (http://events.ortra.com/lva/). LVA/ICA is the only conference dedicated to the models of mixtures of latent variables, a topic encompassing modern audio source separation techniques in which Audionamix is involved. Audionamix will present a paper there (Bayesian non-negative matrix factorization with learned temporal smoothness prior), and will also chair a special session on speech and audio processing.
LVA/ICA 2012 next week!
March 5th, 2012Audionamix Research @ Interspeech 2011 in Florence, Italy
August 11th, 2011Audionamix research team members will be attending the Interspeech conference. This major event about speech science and technology will be held in Florence, Italy, from the 27th to 31st of August 2011.
If you want to meet with us? Please, contact us at research (at) audionamix.com.
Convolved Common Audio Signal Extraction
June 8th, 2011These sound examples correspond to the submitted article with the above title.
The examples are selections out of all multichannel mixtures. The goal is to separate the music and effects (mfx) contribution from the dialog contribution in movie soundtracks.
This problem can be seen as a common signal extraction task, where the common signal is the mfx contribution.
Our previous method based on geometric common signal extraction gave very conclusive results under the hypothesis that the music and effects tracks are exactly the same in the different versions.
The method proposed in the present article addresses a more realistic case, by handling mfx tracks which differ in equalization (or filter).
We present here the results described in the article, for the convolved case only. The different mfx tracks for each version have been individually filtered before being mixed with the corresponding dialog tracks.
We present the results for N=3 and N=5 versions. Note that the N-SP-SUB method only estimates one mfx track.
N=5 channels
| Original | mfx1 | mfx2 | mfx3 | mfx4 | mfx5 | dialog1 | dialog2 | dialog3 | dialog4 | dialog5 | Mixed | - | - | - | - | - | Mix 1 | Mix 2 | Mix 3 | Mix 4 | Mix 5 |
| Results: | ||||||||||
| N-SP-SUB | common mfx | dialog1 | dialog2 | dialog3 | dialog4 | dialog5 | ||||
| CCNMF | mfx1 | mfx2 | mfx3 | mfx4 | mfx5 | dialog1 | dialog2 | dialog3 | dialog4 | dialog5 |
N=3 channels
| Original | mfx1 | mfx2 | mfx3 | dialog1 | dialog2 | dialog3 | Mixed | - | - | - | Mix 1 | Mix 2 | Mix 3 |
| Results: | ||||||||||
| N-SP-SUB | common mfx | dialog1 | dialog2 | dialog3 | ||||||
| CCNMF | mfx1 | mfx2 | mfx3 | dialog1 | dialog2 | dialog3 | ||||
Geometric Multichannel Common Signal Separation With Application to Music and Effects Extraction from Film Soundtracks
November 29th, 2010These sound examples correspond to the submitted article with the above title.
The examples are selections out of all multichannel mixtures (5 per number of channels).
Example with N=3 input channels
| Original | Music + Effects | Dialog 1 | Dialog 2 | Dialog 3 |
| Mixed | - | Mix 1 | Mix 2 | Mix 3 |
| Results: | ||||
| Median | Music + Effects | - | - | - |
| Cone | Music + Effects | - | - | - |
| N-SP / min | Music + Effects | Dialog 1 | Dialog 2 | Dialog 3 |
| N-SP-SUB | Music + Effects | Dialog 1 | Dialog 2 | Dialog 3 |
Example with N=4 input channels
| Original | Music + Effects | Dialog 1 | Dialog 2 | Dialog 3 | Dialog 4 |
| Mixed | - | Mix 1 | Mix 2 | Mix 3 | Mix 4 |
| Results: | |||||
| Median | Music + Effects | - | - | - | - |
| Cone | Music + Effects | - | - | - | - |
| N-SP / min | Music + Effects | Dialog 1 | Dialog 2 | Dialog 3 | Dialog 4 |
| N-SP-SUB | Music + Effects | Dialog 1 | Dialog 2 | Dialog 3 | Dialog 4 |
Example with N=5 input channels
| Original | Music + Effects | Dialog 1 | Dialog 2 | Dialog 3 | Dialog 4 | Dialog 5 |
| Mixed | - | Mix 1 | Mix 2 | Mix 3 | Mix 4 | Mix 5 |
| Results: | ||||||
| Median | Music + Effects | - | - | - | - | - |
| Cone | Music + Effects | - | - | - | - | - |
| N-SP / min | Music + Effects | Dialog 1 | Dialog 2 | Dialog 3 | Dialog 4 | Dialog 5 |
| N-SP-SUB | Music + Effects | Dialog 1 | Dialog 2 | Dialog 3 | Dialog 4 | Dialog 5 |
Adaptation of source-specific dictionaries in Non-Negative Factorization for Source Separation
November 26th, 2010Submitted at ICASSP 2011 conference
This paper concerns the adaptation of spectrum dictionaries in audio source separation with supervised learning. Supposing that samples of the audio sources to separate are available, a filter adaptation in the frequency domain is proposed in the context of Non-Negative Matrix Factorization with the Itakura-Saito divergence. The algorithm is able to retrieve the acoustical filter applied to the sources with a good accuracy, and demonstrates significantly higher performances on separation tasks when compared with the non-adaptive model.
Experiments and Results
We choose two different classes of instruments to test our approach: two polyphonic instruments, piano and guitar, and one monophonic instrument, bass (Bass can be polyphonic but we only address its monophonic usage here.)
The tracks come from real multi-track recordings, so the instruments are expected to play in synchrony and in harmony. The training signal for each source is built from samples of the RWC database [1], and consists of a concatenation of all the whole range of notes of one single instrument per source.
The test data is taken from a commercial recording from which the separated tracks have been made available.
We generate a two-source 0 dB mono mixture which contains the source to separate and another available source (drums).
This leads to three different tests :
- Piano test. Source 1 : piano, source 2: drums.
- Guitar test. Source 1 : guitar, source 2 : drums.
- Bass test. Source 1 : bass, source 2 : drums.
We propose to compare the separation result whether or not filter activation is performed. These examples are the ones with higher source to distorsion ratio (SDR), corresponding to table 2 in the article.
| Original Mixture |
Extracted source w/ filter adaptation |
Extracted source w/o filter adaptation |
|
| Piano Test | mix | with filter | without filter |
| Guitar Test | mix | with filter | without filter |
| Bass Test | mix | with filter | without filter |
[1] M. Goto, H. Hashiguchi, T. Nishimura, and R. Oka, “RWC muisc database : Music genre database and musical instrument sound database”, in Proc. International Conference on Music Information Retrieval (ISMIR), Baltimore, USA, 2003.
Sound Enhancement using sparse approximation with Speclets
March 12th, 2010Accepted at ICASSP 2010 conference
This paper addresses an innovative approach to informed enhancement of damaged sound. It uses sparse approximations with a learned dictionary of atoms modeling the main components of the undamaged source spectra. The decomposition process aims at finding which of the atoms could constitute the decomposition of the undamaged source in order to recover it. The decomposition of the damaged signal is done with a Matching Pursuit algorithm and involves an adaptation of the dictionary learned on undamaged sources. The technique has been evaluated on synthetic signals, and encouraging results are proposed for a real trumpet signal.
Experiments and Results
For Synthetic signals, original signal is a harmonic serial, which fundamental frequency is 400Hz, has 40 partials with exponentially decreasing amplitudes, thus having energy up to 16KHz. The signal is also temporally windowed by a Gaussian window.
The learning process is conducted on a set of similar signals, having only their fundamental frequencies varying.
The results are the following:
For one note:
synthetic_damaged_fc6k_destructif
synthetic_restored_fc6k_destructif
For a mix of two notes:
synthetic_2notes_damaged_fc5k_destructif
synthetic_2notes_recovered_fc5k_destructif
It sounds like a buzz. Damaged signal is the same except for which we intentionally destroyed frequencies above 8KHz, and sounds like the original being heavily filtered. This damaging is destructive, which means all partials above 8KHz are really destroyed, not just lowered.
The enhancement method manages to recreate missing parts of the spectrum, by replacing atoms from the processed dictionary by corresponding full band versions of the same atoms. It can easily be heard that the reconstituted signal is very close to the original one, and that a great part of missing frequencies have been reconstructed.
The results are the following:
For a single trumpet note:
carioca_1note_dicodifferent_damaged_fc6k_destructif
carioca_1note_dicodifferent_recovered_fc6k_destructif
For a full phrase: