Multiclass Feature Selection with Kernel Gram-matrix based criteria

General Description

This page provides some complementary information about the paper "Multiclass Feature Selection with Kernel Gram-matrix based criteria", M. Ramona, G. Richard, B. David, submitted to IEEE Transactions on Neural Networks and Learning Systems :

  • The source code of the proposed methods (detailed in this section) is available here.
  • The datasets used in the experimental part are detailed in this section, where links are provided to download the original data.
  • The speech/music dataset that we provide is freely available here and detailed in this section.

If you have any questions or remarks concerning the dataset, please fell free to contact me.

Feature Selection Methods

The paper presents new methods for feature selection adapted to Support Vector Machines, based on recent Kernel Gram-matrix based criteria

  • Scaled Alignment Selection : scale-factors gradient descent on the Kernel Target Alignment criterion.
  • Scaled Frobenius Selection : scale-factors gradient descent on Frobenius criterion (i.e. unnormalized Alignment).
  • Scaled Class Separability Selection : scale-factors gradient descent on the Kernel Class Separability criterion.
  • Kernel Fisher Discriminant Selection : Kernel Fisher Discriminant based method, used with linear kernel.

The proposed methods are compared to the classic Fisher criterion and state-of-the-art SVM-based feature algorithms (for convenience, all the references cited share the article's indexes) :

  • AROM : zero-norm minimization with l2 norm approximation, used with linear kernel [12].
  • RFE : recursive feature elimination based on the normal vector components [13].
  • R2W2 : scale-factors gradient descent on the radius-margin criterion [14].

Datasets

The experimental protocol involves various datasets retrieved from the literature :

  • Toy linear : a linearly separable artificial data set randomly generated, described in [12][14]. The matlab source code generating the examples is available here on Jason Weston's public web site.
  • Toy non-linear : a linearly non-separable artificial data set randomly generated, described in [12][14].
  • Spambase : a spam e-mails detection dataset, retrieved from UCI [26]. The full dataset is available here.
  • Ionosphere : an antenna-pulse feature based discrimination dataset, retrieved from UCI [26]. The full dataset is available here.
  • Lymphoma : a DNA microarray analysis scheme describing the B-Cell Lymphoma problem described in [12]. The dataset is available here on Jason Weston's web site.

Speech/Music dataset

We have also provided results on a dataset built from our research field on audio indexing, the dataset matlab structure is available here.

The dataset is based on the audio files provided in the corpus of the ESTER (Evaluation des Systèmes de Transcription Enrichie d'émissions Radiophoniques) french evaluation campaign on broadcast show transcription. This campaign includes, among others, a Sound Event Segmentation task with consists in the location of speech and music regions in an audio file. Please note that only the computed features are given are, the audio corpus material being subject to license restrictions. A large collection of features has been computed on the ESTER speech and music regions of the training set, divided between short-term features, computed on 30ms frames with 15ms overlaping, and long-term features, computed on 1s frames with 0.5s overlaping. The short-term features include :

  • AR : Auto Regressive filter first 2 coefficients.
  • MFCC : 13 Mel-Frequency Cepstral Coefficients including the energy, along with their first and second derivates.
  • ZCR : the Zero Crossing Rate.
  • Q-constant : 6 Q-Constant based filter coeffecients based on octave width filters, and 10 based on fifth width filters.
  • OBSI : 9 octave width filters bank coefficients log-energy, along with the 8 adjacent coefficients quotients. The OBSI (Octave Band Signal Intensities) has been proposed by S. Essid in his thesis [Essid05].
  • Perc : perceptual spread and sharpness, on the loudness measure.
  • S desc : spectral descriptor including the spectral slope, decrease, flux, the cut frequency and the geometric to arithmetic mean quotient.
  • S moments : spectral statistical moments including the spectral centroid, width, asymmetry and flatness.
  • T moments : temporal statistical moments (same definitions).
  • YIN : two features are extracted with the YIN library released by Alain de Cheveigné : the f0 pitch estimation and the aperiodicity measure.

These features are provided along with the first and second derivatives, except for AR, ZCR, OBSI, S desc, YIN. The short-term features mean and standard deviation are computed on long-term frames in order to only work on long-term features in the end.

The long-term features include :

  • AM : amplitude modulation descriptors including the 4 values computed on both the granularity (4-8Hz) and rugosity (10-40Hz) spectral bands.
  • LT moments : the same temporal statistical moments computed on long-term frames.
  • LE moments : the same temporal statistical moments computed on long-term frames amplitude envelopes.
  • LZCR : ZCR computed on long-term frames.

The long-term moment features (LT and LE moments) are provided along with the first and second derivatives.

The dataset is provided here in a Matlab structure. The collection thus includes a total of 133 features constituting a 321 dimension features matrix on a total of 20000 frames (10000 for each class) in the matrix X, the vector Y describes the class label for each example ; class music is labeled -1 and class speech is labeled +1. The features have all been centered and divided by their standard deviation.

References

[12] J. Weston, A. Elisseeff, B. Scholkopf, and M. Tipping, “Use of the zero-norm with linear models and kernel methods,” Journal of Machine Learning Research, vol. 3, pp. 1439–1491, March 2003. (http://portal.acm.org/citation.cfm?id=944919.944982)

[13] I. Guyon, J. Weston, S. Barnhill and V. Vapnik, "Gene Selection for Cancer Classification using Support Vector Machines," Machine Learning, vol. 46 (1-3), pp 389-422, 2002. (http://portal.acm.org/citation.cfm?id=599671)

[14] J. Weston, S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio, and V. Vapnik, “Feature Selection for SVMs,” in Advances in Neural Information Processing Systems, vol. 13. MIT Press, Cambridge, MA, 2000, pp. 668–674. (http://citeseer.ist.psu.edu/weston01feature.html)

[26] A. Asuncion and D. Newman, “UCI machine learning repository,” 2007. (http://archive.ics.uci.edu/ml/)

[Essid05] S. Essid, “Classification automatique des signaux audio-fréquences : reconnaissance des instruments de musique,” PhD Thesis, Université Pierre et Marie Curie, 2005. (http://pastel.paristech.org/2738/)

Une réflexion sur “ Multiclass Feature Selection with Kernel Gram-matrix based criteria ”

Les commentaires sont fermés.