LVA/ICA 2012 next week!

March 5th, 2012

Members of the Audionamix research team will attend the 11th annual International Conference on Latent Variable Analysis and Signal Processing (LVA/ICA) in Tel-Aviv, Israel (http://events.ortra.com/lva/). LVA/ICA is the only conference dedicated to the models of mixtures of latent variables, a topic encompassing modern audio source separation techniques in which Audionamix is involved. Audionamix will present a paper there (Bayesian non-negative matrix factorization with learned temporal smoothness prior), and will also chair a special session on speech and audio processing.

Audionamix Research @ Interspeech 2011 in Florence, Italy

August 11th, 2011

Audionamix research team members will be attending the Interspeech conference. This major event about speech science and technology will be held in Florence, Italy, from the 27th to 31st of August 2011.
If you want to meet with us? Please, contact us at research (at) audionamix.com.

Convolved Common Audio Signal Extraction

June 8th, 2011

These sound examples correspond to the submitted article with the above title.

The examples are selections out of all multichannel mixtures. The goal is to separate the music and effects (mfx) contribution from the dialog contribution in movie soundtracks.
This problem can be seen as a common signal extraction task, where the common signal is the mfx contribution.
Our previous method based on geometric common signal extraction gave very conclusive results under the hypothesis that the music and effects tracks are exactly the same in the different versions.
The method proposed in the present article addresses a more realistic case, by handling mfx tracks which differ in equalization (or filter).
We present here the results described in the article, for the convolved case only. The different mfx tracks for each version have been individually filtered before being mixed with the corresponding dialog tracks.
We present the results for N=3 and N=5 versions. Note that the N-SP-SUB method only estimates one mfx track.

N=5 channels

Original mfx1 mfx2 mfx3 mfx4 mfx5 dialog1 dialog2 dialog3 dialog4 dialog5
Mixed - - - - - Mix 1 Mix 2 Mix 3 Mix 4 Mix 5
Results:
N-SP-SUB common mfx dialog1 dialog2 dialog3 dialog4 dialog5
CCNMF mfx1 mfx2 mfx3 mfx4 mfx5 dialog1 dialog2 dialog3 dialog4 dialog5

N=3 channels

Original mfx1 mfx2 mfx3 dialog1 dialog2 dialog3
Mixed - - - Mix 1 Mix 2 Mix 3
Results:
N-SP-SUB common mfx dialog1 dialog2 dialog3
CCNMF mfx1 mfx2 mfx3 dialog1 dialog2 dialog3

Geometric Multichannel Common Signal Separation With Application to Music and Effects Extraction from Film Soundtracks

November 29th, 2010

These sound examples correspond to the submitted article with the above title.

The examples are selections out of all multichannel mixtures (5 per number of channels).

Example with N=3 input channels

Original Music + Effects Dialog 1 Dialog 2 Dialog 3
Mixed - Mix 1 Mix 2 Mix 3
Results:
Median Music + Effects - - -
Cone Music + Effects - - -
N-SP / min Music + Effects Dialog 1 Dialog 2 Dialog 3
N-SP-SUB Music + Effects Dialog 1 Dialog 2 Dialog 3

Example with N=4 input channels

Original Music + Effects Dialog 1 Dialog 2 Dialog 3 Dialog 4
Mixed - Mix 1 Mix 2 Mix 3 Mix 4
Results:
Median Music + Effects - - - -
Cone Music + Effects - - - -
N-SP / min Music + Effects Dialog 1 Dialog 2 Dialog 3 Dialog 4
N-SP-SUB Music + Effects Dialog 1 Dialog 2 Dialog 3 Dialog 4

Example with N=5 input channels

Original Music + Effects Dialog 1 Dialog 2 Dialog 3 Dialog 4 Dialog 5
Mixed - Mix 1 Mix 2 Mix 3 Mix 4 Mix 5
Results:
Median Music + Effects - - - - -
Cone Music + Effects - - - - -
N-SP / min Music + Effects Dialog 1 Dialog 2 Dialog 3 Dialog 4 Dialog 5
N-SP-SUB Music + Effects Dialog 1 Dialog 2 Dialog 3 Dialog 4 Dialog 5

Adaptation of source-specific dictionaries in Non-Negative Factorization for Source Separation

November 26th, 2010

Status :

Submitted at ICASSP 2011 conference

Authors :

Xabier Jaureguiberry

Pierre Leveau

Simon Maller

Juan José Burred

Abstract :

This paper concerns the adaptation of spectrum dictionaries in audio source separation with supervised learning. Supposing that samples of the audio sources to separate are available, a filter adaptation in the frequency domain is proposed in the context of Non-Negative Matrix Factorization with the Itakura-Saito divergence. The algorithm is able to retrieve the acoustical filter applied to the sources with a good accuracy, and demonstrates significantly higher performances on separation tasks when compared with the non-adaptive model.

Experiments and Results

We choose two different classes of instruments to test our approach: two polyphonic instruments, piano and guitar, and one monophonic instrument, bass (Bass can be polyphonic but we only address its monophonic usage here.)

The tracks come from real multi-track recordings, so the instruments are expected to play in synchrony and in harmony. The training signal for each source is built from samples of the RWC database [1], and consists of a concatenation of all the whole range of notes of one single instrument per source.

The test data is taken from a commercial recording from which the separated tracks have been made available.

We generate a two-source 0 dB mono mixture which contains the source to separate and another available source (drums).

This leads to three different tests :

  1. Piano test. Source 1 : piano, source 2: drums.
  2. Guitar test. Source 1 : guitar, source 2 : drums.
  3. Bass test. Source 1 : bass, source 2 : drums.

We propose to compare the separation result whether or not filter activation is performed. These examples are the ones with higher source to distorsion ratio (SDR), corresponding to table 2 in the article.

Original Mixture

Extracted source w/ filter adaptation

Extracted source w/o filter adaptation

Piano Test mix with filter without filter
Guitar Test mix with filter without filter
Bass Test mix with filter without filter

[1] M. Goto, H. Hashiguchi, T. Nishimura, and R. Oka,  “RWC muisc database : Music genre database and musical instrument sound database”, in Proc. International Conference on Music Information Retrieval (ISMIR), Baltimore, USA, 2003.

Sound Enhancement using sparse approximation with Speclets

March 12th, 2010

Status :

Accepted at ICASSP 2010 conference

Authors :

Manuel Moussallam

Pierre Leveau

Si Mohamed Aziz Sbaï

Abstract :

This paper addresses an innovative approach to informed enhancement of damaged sound. It uses sparse approximations with a learned dictionary of atoms modeling the main components of the undamaged source spectra. The decomposition process aims at finding which of the atoms could constitute the decomposition of the undamaged source in order to recover it. The decomposition of the damaged signal is done with a Matching Pursuit algorithm and involves an adaptation of the dictionary learned on undamaged sources. The technique has been evaluated on synthetic signals, and encouraging results are proposed for a real trumpet signal.

Experiments and Results

Overview :

For Synthetic signals, original signal is a harmonic serial, which fundamental frequency is 400Hz, has 40 partials with exponentially decreasing amplitudes, thus having energy up to 16KHz. The signal is also temporally windowed by a Gaussian window.

The learning process is conducted on a set of similar signals, having only their fundamental frequencies varying.

The results are the following:
For one note:

synthetic_original

synthetic_damaged_fc6k_destructif

synthetic_restored_fc6k_destructif

For a mix of two notes:

synthetic_2notes_original

synthetic_2notes_damaged_fc5k_destructif

synthetic_2notes_recovered_fc5k_destructif

It sounds like a buzz. Damaged signal is the same except for which we intentionally destroyed frequencies above 8KHz, and sounds like the original being heavily filtered. This damaging is destructive, which means all partials above 8KHz are really destroyed, not just lowered.

The enhancement method manages to recreate missing parts of the spectrum, by replacing atoms from the processed dictionary by corresponding full band versions of the same atoms. It can easily be heard that the reconstituted signal is very close to the original one, and that a great part of missing frequencies have been reconstructed.

For real trumpet signals, a dictionary of trumpet spectra is learned using the RWC Database. Then we chose a short segment of trumpet notes, apply a damaging filter on it, then reconstructed it with the described method.

The results are the following:

For a single trumpet note:

carioca_1note_original

carioca_1note_dicodifferent_damaged_fc6k_destructif

carioca_1note_dicodifferent_recovered_fc6k_destructif

For a full phrase:

real_original

real_damaged_fc6khz_nondestructif

real_restored_fc6k_nondestructif


Protected: Results – Tino Rossi – Petit Papa Noël

May 19th, 2009

This post is password protected. To view it please enter your password below:


Protected: Dubliners Springhill Mine Disaster test results

April 30th, 2009

This post is password protected. To view it please enter your password below: