src.classification.applePy package

Submodules

src.classification.applePy.channel_selection module

Code for channel selection.

class src.classification.applePy.channel_selection.ElectrodeSelection(nelec=16, metric='riemann', n_jobs=1)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Channel selection based on a Riemannian geometry criterion.

For each class, a centroid is estimated, and the channel selection is based on the maximization of the distance between centroids. This is done by a backward elimination where the electrode that carries the less distance is removed from the subset at each iteration. This algorithm is described in [1].

Parameters

nelec : int (default 16)

the number of electrode to keep in the final subset.

metric : string | dict (default: ‘riemann’)

The type of metric used for centroid and distance estimation. see mean_covariance for the list of supported metric. the metric could be a dict with two keys, mean and distance in order to pass different metric for the centroid estimation and the distance estimation. Typical usecase is to pass ‘logeuclid’ metric for the mean in order to boost the computional speed and ‘riemann’ for the distance in order to keep the good sensitivity for the selection.

n_jobs : int, (default: 1)

The number of jobs to use for the computation. This works by computing each of the class centroid in parallel. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used.

Attributes

covmeans_ : list

the class centroids.

dist_ : list

list of distance at each interaction.

References

[1] A. Barachant and S. Bonnet, “Channel selection procedure using riemannian distance for BCI applications,” in 2011 5th International IEEE/EMBS Conference on Neural Engineering (NER), 2011, 348-351

fit(X, y=None, sample_weight=None)[source]

Find the optimal subset of electrodes.

Parameters

X : ndarray, shape (n_trials, n_channels, n_channels)

ndarray of SPD matrices.

y : ndarray shape (n_trials, 1)

labels corresponding to each trial.

sample_weight : None | ndarray shape (n_trials, 1)

the weights of each sample. if None, each sample is treated with equal weights.

Returns

self : ElectrodeSelection instance

The ElectrodeSelection instance.
score(estimator, x_test, y_test)[source]
transform(X)[source]

Return reduced matrices.

Parameters

X : ndarray, shape (n_trials, n_channels, n_channels)

ndarray of SPD matrices.

Returns

covs : ndarray, shape (n_trials, n_elec, n_elec)

The covariances matrices after reduction of the number of channels.
class src.classification.applePy.channel_selection.ElectrodeSelectionRaw(nelec=16, metric='riemann', n_jobs=1)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

fit(X, y=None, sample_weight=None)[source]

Find the optimal subset of electrodes.

transform(X)[source]

Return reduced matrices.

src.classification.applePy.channel_selection_without_covariances module

Code for channel selection.

src.classification.applePy.classifier module

class src.classification.applePy.classifier.ApplePyClassifier(used_pipelines=None)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Global class dealing with automated classification. Inherits BaseEstimator and TransformerMix, in order to make it adaptable to other estimators from other libraries. This class contains :

  • A pipeline catalogue with all the pipelines and the important information about the parameters to fit
  • All the predictions and the prediction probabilities for all pipelines
  • All the correct answers
  • All the scores, confusion matrices, precisions, recalls, and ROC infos for all matrices
  • The dataset and the labels, as well as source dataset
  • The group indices
  • The eventual test dataset, labels and groups
classify(dataset, dataset_path=None, test_dataset_size=5, cv_value=5, independent_features_selection=False, channels_to_select=20, use_groups=True, tune_hypers=False, classify_test=False)[source]

Global classification method of the library. (inter-subjects)

Reads the dataset, can apply independent features selection, can tune hyperparameters, fits, predicts, scores, and shows results.

Parameters

dataset : string; the path to the dataset

test_dataset_size : int; the fraction of the dataset to be considered for testing

pre_epoched : boolean; whether the dataset is already epoched or not

tmin, tmax : time limits for the epochs

bads : list of electrodes to be rejected

picks : list of electrodes to be worked on

filtering : tuple containing the higher and lower frequencies to filter the data

tmin, tmax : tmin, tmax for delimiting the epochs in time

ICA : Boolean, whether to apply Independent Component Analysis or not

resample : boolean, whether to resample the data at 512 Hz or not

baseline : the baseline to be applied to the data

cv_value : int; the number of folds for cross validation. If None, the number of subjects will be used (Leave one out)

independent_features_selection : boolean; whether to apply independent features selection or not

channels_to_select : int or None; the number of channels to be selected or None if automatic number selection

use_groups : boolean; whether to use groups for cross validation or not

tune_hypers : boolean; whether to tune hyperparameters or not

names : list; names of the categories

classify_test : boolean; whether there should be a separate test dataset or not

classify_intraSubject(dataset, divided_dataset=True, nb_subj=None, test_dataset_size=5, pre_epoched=True, tmin=-0.2, tmax=0.5, bads=None, picks=None, filtering=[None, None], ICA=False, resample=False, baseline=None, event_ids=[None, None], reference=None, cv_value=5, independent_features_selection=False, channels_to_select=20, tune_hypers=False, names=[0, 1], classify_test=False, use_all_pipelines=False)[source]

Global classification method of the library. (intra-subject)

Reads the dataset, can apply independent features selection, can tune hyperparameters, fits, predicts, scores, and shows results.

Parameters

dataset : string; the path to the dataset

nb_subj : int; number of subjects to consider

test_dataset_size : int; the fraction of the dataset to be considered for testing

pre_epoched : boolean; whether or not the dataset is already epoched

tmin, tmax : time limits for the epochs

bads : list of electrodes to be rejected

picks : list of electrodes to be worked on

filtering : tuple containing the higher and lower frequencies to filter the data

tmin, tmax : tmin, tmax for delimiting the epochs in time

ICA : Boolean, whether or not to apply Independent Component Analysis

resample : boolean, whether or not to resample the data at 512 Hz

baseline : the baseline to be applied to the data

cv_value : int; the number of folds for cross validation. If None, the number of subjects will be used (Leave one out)

independent_features_selection : boolean; whether or not to apply independent features selection

channels_to_select : int or None; the number of channels to be selected or None if automatic number selection

tune_hypers : boolean; whether or not to tune hyperparameters

names : list; names of the categories

classify_test : boolean; whether or not there should be a separate test dataset

use_all_pipelines : boolean; whether or not all the pipelines should be used for classification or only a subset

classify_with_CNN(dataset, nb_subj=None, divided_dataset=True, pre_epoched=True, tmin=0, tmax=0.5, bads=None, picks=None, filtering=[None, None], ICA=False, resample=False, baseline=None, event_ids=[None, None], reference=None, test_size=5)[source]

Global classification method of the library. (inter-subjects)

Reads the dataset, can apply independent features selection, can tune hyperparameters, fits, predicts, scores, and shows results.

Parameters

dataset : string; the path to the dataset

nb_subj : int; number of subjects to consider

test_dataset_size : int; the fraction of the dataset to be considered for testing

pre_epoched : boolean; whether or not the dataset is already epoched

tmin, tmax : time limits for the epochs

bads : list of electrodes to be rejected

picks : list of electrodes to be worked on

filtering : tuple containing the higher and lower frequencies to filter the data

tmin, tmax : tmin, tmax for delimiting the epochs in time

ICA : Boolean, whether or not to apply Independent Component Analysis

resample : boolean, whether or not to resample the data at 512 Hz

baseline : the baseline to be applied to the data

cv_value : int; the number of folds for cross validation. If None, the number of subjects will be used (Leave one out)

independent_features_selection : boolean; whether or not to apply independent features selection

channels_to_select : int or None; the number of channels to be selected or None if automatic number selection

use_groups : boolean; whether or not to use groups for cross validation

tune_hypers : boolean; whether or not to tune hyperparameters

names : list; names of the categories

classify_test : boolean; whether or not there should be a separate test dataset

use_all_pipelines : boolean; whether or not all the pipelines should be used for classification or only a subset

count_subjects(directory)[source]

Count the number of subjects in the dataset.

Parameters

directory : the path to the dataset

create_folder(k)[source]

Creates the k-folds folder for the data.

Parameters

k : int

number of folds
delete_pipelines(pipeline_names)[source]

Deletes a pipeline by calling the method from the pipeline catalogue. Additionally, deletes all occurences of the pipeline.

Parameters

name : string

name of the pipeline
estimate_covariance_matrices(dataset)[source]

Creates the simple covariance matrices for the raw data.

Result format = nb_subj x nb_epochs x nb_channels x nb_channels

Parameters

dataset : the dataset on which to estimate the covariance matrices.

estimate_sources(raw_dataset, info, tmin_noise, tmax_noise, trans=None, sourceSpaces=None, bemSolution=None, mixedSourceSpaces=None, loose=1, snr=3, fixed=False)[source]

Estimates the sources for the dataset using a Sources_estimator object.

Parameters

raw_dataset : the dataset to be estimated

info : info dictionary for the EEG recordings (see Epochs.info)

tmin_noise, tmax_noise : tmin and tmax for delimiting the noise estimation

trans=None : path to the eventual coregistration file

sourceSpaces : path to the eventual source spaces file

bemSolution : path to the eventual bem solution file

mixedSourceSpaces : path to the eventual mixed source spaces file

loose : between 0 and 1. “value that weights the source variances of the dipole components that are parallel (tangential) to the cortical surface”

snr = signal to noise ratio value

fixed : Boolean, whether or not to use fixed source orientations normal to the cortical mantle

final_score()[source]

Final scoring function. For each pipeline, a score, confusion matrix, precision, recall, and ROC curve is created.

fit(x_train, y_train)[source]

Fits all the pipeline to a provided dataset.

Parameters

x_train : the training samples

y_train : the correct answers to the training samples

fit_transform(x, y)[source]

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input samples.
  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
  • **fit_params (dict) – Additional fit parameters.
Returns:

X_new – Transformed array.

Return type:

ndarray array of shape (n_samples, n_features_new)

independent_features_selection(use_sources=False, channels_to_select=None, use_groups=True)[source]

Applies independent channel or zone selection on the data, without the pipelines. Only the selected channels or zones will be kept on the final data.

Parameters

use_sources : boolean; whether or not to use sources for feature selection

cv : int; number of folds for cross validation

channels_to_select : int; if any, the desired number of features

use_groups : boolean; whether or not to use groups when dividing the data

modify_add_pipeline(name, pipeline, parameters)[source]

Modifies or adds a new pipeline to the catalogue.

Parameters

name : string

name of the pipeline

pipeline : instance of pipeline

the pipeline to be used

parameters : see doc for pipeline catalogue

the notation for the parameters to fit
modify_parameters(name, parameters)[source]

Modifies a pipeline’s parameters by calling the method from the pipeline catalogue.

Parameters

name : string

name of the pipeline

parameters : see doc for pipeline catalogue

the notation for the parameters to fit
plot_ROC(nb_lines, nb_columns)[source]

Plots the ROC curves for all the pipelines from the catalogue.

Parameters

nb_lines : the number of lines to be used for all the pipelines

nb_columns : the number of columns to be used for all the pipelines

plot_ROC_(nb_columns)[source]
plot_confusion(nb_lines, nb_columns, names)[source]

Plots the confusion matrices for all the pipelines from the catalogue.

Parameters

nb_lines : the number of lines to be used for all the pipelines

nb_columns : the number of columns to be used for all the pipelines

names : list; list containing the names of the labels for the categories.

plot_confusion_(nb_columns)[source]
predict(x_test)[source]

Predicts the values for a dataset.

Parameters

x_test : the test samples

predict_proba(x_test)[source]

Predicts with probabilities the values for a dataset.

Parameters

x_test : the test samples

predict_test_dataset()[source]

Predicts the left out test dataset, computes the score and shows the results.

prepare_nonepoched_dataset(directory, nb_subj, tmin, tmax, bads=None, picks=None, filtering=[None, None], ICA=False, resample=False, baseline=None, event_ids=[None, None], reference=None)[source]

Prepare a subjects x epochs x channels x times dataset from raw files

Parameters

directory : the path to the directory containing the dataset

nb_subj : the number of subjects to be considered

tmin, tmax : tmin, tmax for delimiting the epochs in time

bads : list of electrodes to be rejected

picks : list of electrodes to be worked on

filtering : tuple containing the higher and lower frequencies to filter the data

tmin, tmax : tmin, tmax for delimiting the epochs in time

ICA : Boolean, whether or not to apply Independent Component Analysis

resample : boolean, whether or not to resample the data at 512 Hz

baseline : the baseline to be applied to the data

event_ids : The id of the event to consider

reference : the name of the reference to be applied to the data

prepare_nonepoched_dataset_oneFilePerSubj(directory, nb_subj, tmin, tmax, bads=None, picks=None, filtering=[None, None], ICA=False, resample=False, baseline=None, event_ids=[None, None])[source]

Prepare a subjects x epochs x channels x times dataset from raw files

Parameters

directory : the path to the directory containing the dataset

nb_subj : the number of subjects to be considered

tmin, tmax : tmin, tmax for delimiting the epochs in time

bads : list of electrodes to be rejected

picks : list of electrodes to be worked on

filtering : tuple containing the higher and lower frequencies to filter the data

tmin, tmax : tmin, tmax for delimiting the epochs in time

ICA : Boolean, whether or not to apply Independent Component Analysis

resample : boolean, whether or not to resample the data at 512 Hz

baseline : the baseline to be applied to the data

event_ids : The id of the event to consider

reference : the name of the reference to be applied to the data

read_all_files(directory, nb_subj=None, divided_dataset=True, tmin=0, tmax=0.5, bads=None, picks=None, filtering=[None, None], pre_epoched=True, ICA=False, resample=False, baseline=None, event_ids=[None, None], reference=None)[source]

Create a subjects x epochs x channels x times dataset from epoched files.

Parameters

directory : the path to the directory containing the dataset

nb_subj : the number of subjects to be considered

divided_dataset : boolean; whether or not the dataset is divided by classes

tmin, tmax : tmin, tmax for delimiting the epochs in time

bads : list of electrodes to be rejected

picks : list of electrodes to be worked on

filtering : tuple containing the higher and lower frequencies to filter the data

pre_epoched : boolean; whether or not the dataset has been pre-epoched

tmin, tmax : tmin, tmax for delimiting the epochs in time

ICA : Boolean, whether or not to apply Independent Component Analysis

resample : boolean, whether or not to resample the data at 512 Hz

baseline : the baseline to be applied to the data

event_ids : The id of the event to consider

reference : the name of the reference to be applied to the data

read_one_file(file_path, file_name, destination, bads=None, picks=None, filtering=(1, 45), tmin=0, tmax=0.5, ICA=False, resample=False, baseline=None, event_ids=None, reference=None)[source]

Reads one non pre-epoched (raw) set file and saves the result in a -epo.fif file.

Parameters

file_path : path to the file to be opened

file_name : name of the file

destination : path where the -epo.fif file will be saved

bads : list of electrodes to be rejected

picks : list of electrodes to be worked on

filtering : tuple containing the higher and lower frequencies to filter the data

tmin, tmax : tmin, tmax for delimiting the epochs in time

ICA : Boolean, whether or not to apply Independent Component Analysis

resample : boolean, whether or not to resample the data at 512 Hz

baseline : the baseline to be applied to the data

event_ids : The id of the event to consider

reference : the name of the reference to be applied to the data

read_preepoched_oneFilePerSubj(directory, nb_subj, tmin, tmax, bads=None, picks=None, filtering=[None, None], ICA=False, resample=False, baseline=None, event_ids=[None, None], reference=None)[source]

Create a subjects x epochs x channels x times dataset from epoched files.

Parameters

directory : the path to the directory containing the dataset

nb_subj : the number of subjects to be considered

tmin, tmax : tmin, tmax for delimiting the epochs in time

bads : list of electrodes to be rejected

picks : list of electrodes to be worked on

filtering : tuple containing the higher and lower frequencies to filter the data

tmin, tmax : tmin, tmax for delimiting the epochs in time

ICA : Boolean, whether or not to apply Independent Component Analysis

resample : boolean, whether or not to resample the data at 512 Hz

baseline : the baseline to be applied to the data

event_ids : The id of the event to consider

reference : the name of the reference to be applied to the data

restore()[source]

Restores the classifier by restoring the predictions, predictions probabilities, expected answers, scores, confusion matrices, precisions, recalls and roc info

save_program_log(path)[source]

Saves the program log in a file.

Parameters

path : string; path to the place where the file will be saved.

score(x, y)[source]

Return the accuracy on the given test data and labels.

Parameters

x : test samples

y : correct answers for x

weights : Sample weights

score_all_pipelines(x, y)[source]

Return the accuracy on the given predictions and labels for all trained pipelines.

Parameters

x : test samples

y : correct answers for x

score_func(estimator, x_test, y_test)[source]

Scoing function for the independent feature selection. The previously trained pipeline is used to predict the test dataset. At the end, the predicted data are compared to the correct answers, and the percentage of correctly classified data is considered the score.

Parameters

estimator : the estimator that predicts and scores

x_test : the test dataset

y_test : the correct answers to the dataset

score_func_independent_feat_selection(estimator, x_test, y_test)[source]

Scoing function for the independent feature selection. The previously trained pipeline is used to predict the test dataset.

At the end, the predicted data are compared to the correct answers, and the percentage of correctly classified data is considered the score.

Parameters

estimator : the estimator that predicts and scores

x_test : the test dataset

y_test : the correct answers to the dataset

show_results(nb_columns)[source]

Shows the ROC curves and the confusion matrices for all the pipelines.

Parameters

nb_columns : the number of columns to be used for all the pipelines

names : list; list containing the names of the labels for the categories.

tune_hyperparameters(cv, use_sources=False, use_groups=True, factor=None)[source]

Tune the hyperparameters for the different pipelines and replace the pipelines by their improved versions.

Parameters

cv : cross validation for tuning

use_sources : boolean; whether or not to use the sources dataset

use_groups : boolean; whether or not to use groups for the cross validation

factor : int; the stair by which to augment the number of filters or electrodes

src.classification.applePy.cnn module

class src.classification.applePy.cnn.CNN[source]

Bases: object

compile()[source]
evaluate(x_test, y_test)[source]
predict(x_test)[source]
show_results()[source]
train(x_train, y_train, x_test, y_test)[source]

src.classification.applePy.pipeline_catalogue module

class src.classification.applePy.pipeline_catalogue.Pipeline_catalogue(used_pipeline=None, channels_selected=False)[source]

Bases: object

Pipeline catalogue is mainly composed of the catalogue and the parameters to fit. The catalogue contains 12 pre-made pipelines, and the parameters to fit contains, for each pipeline, the different parameters that should be tested in order to obtain better results.

change_logReg(new_classifier, channels_selected=None)[source]
delete_pipeline(name)[source]

Deletes a pipeline and the pipeline’s parameters to fit. :param name: name of the pipeline :type name: string

modify_add_pipeline(name, pipeline, parameters)[source]

Allows the user to add their own pipeline to the available catalogue, or to modify a pipeline (for example, using their own hyper-values for a pipeline, such as the number of electrodes to select, of CSP filters, of xDAWN filters, etc…). Code : 0 = no need to tune

1 = Grid Search 2 = Randomized Search

if grid search : parameters = [1, [,,,]] if randomized search : parameters = [2, [,,,], nb_iter*]

Parameters:
  • name (string) – name of the pipeline
  • pipeline (instance of pipeline) – the pipeline to be used
  • parameters (see above) – the notation for the parameters to fit
modify_parameters(name, parameters)[source]

Replaces the old parameters for a pipeline by new ones. :param name: name of the pipeline :type name: string :param parameters: the notation for the parameters to fit :type parameters: see modify_pipeline

src.classification.applePy.saver module

src.classification.applePy.sources_estimator module

class src.classification.applePy.sources_estimator.Sources_estimator(raw_dataset, info, tmin_noise, tmax_noise, trans=None, sourceSpaces=None, bemSolution=None, mixedSourceSpaces=None, coregistration=None, loose=1, snr=3, fixed=False)[source]

Bases: object

apply_common_average()[source]

Applies a common average reference if not already applied. Raw_dataset format must be : subjects x conditions x epochsEEGLAB Raw_dataset must be an array.

compute_noise_covariances()[source]

Computes the noise covariances for the raw data. Raw_dataset format must be : subjects x conditions x epochsEEGLAB The limits of the noise must be provided.

create_inverse_operator(forwardSolution, noise_covariances, depth, fixed)[source]

Calls the inverse operator method from mne with the chosen arguments. Returns the inverse operator.

static create_labels()[source]
estimate_sources()[source]

Call all the above methods in order to estimate the sources on the dataset.

extract_labels_from_sources(labelsParc, inverseOperator)[source]
open_make_forward_solutions()[source]

Opens the forward solution parts from the default files or the specified files. Then, creates the forward solution from the three previously opened parts.

save_sources(inverse_operators)[source]

Computes the sources for all subjects and conditions. Default estimated snr is 3. Raw_dataset format must be : subjects x conditions x epochsEEGLAB

src.classification.applePy.tools module

class src.classification.applePy.tools.CospBoostingClassifier(baseclf)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Co-spectral matrix bagging. Source from Cédric Simar.

fit(X, y)[source]
predict_proba(X)[source]
transform(X)[source]
class src.classification.applePy.tools.DownSampler(factor=4)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Downsample transformer. Source from Cédric Simar.

fit(X, y)[source]
transform(X)[source]
class src.classification.applePy.tools.EpochsVectorizer[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Vectorize epochs. Source from Cédric Simar.

fit(X, y)[source]
transform(X)[source]
class src.classification.applePy.tools.PSDfiltering(frequencies=array([[ 1, 4], [ 4, 8], [ 8, 15], [15, 20], [30, 40]]), sampling_freq=512, overlap=0.25)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Power Spectral Density class. Code inspired from Cédric Simar.

compute_power_spectral_density(windowed_signal, psd_freqs, sampling_freq, overlap)[source]

Compute the PSD of each 32 electrodes and form a binned spectrogram of 5 frequency bands. Return the log_10 on the 32 spectrogram.

fit(X, y)[source]
transform(X)[source]
src.classification.applePy.tools.source_estimation()[source]

src.classification.applePy.transformer module

Module contents