src.classification.applePy package¶
Submodules¶
src.classification.applePy.channel_selection module¶
Code for channel selection.
-
class
src.classification.applePy.channel_selection.
ElectrodeSelection
(nelec=16, metric='riemann', n_jobs=1)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Channel selection based on a Riemannian geometry criterion.
For each class, a centroid is estimated, and the channel selection is based on the maximization of the distance between centroids. This is done by a backward elimination where the electrode that carries the less distance is removed from the subset at each iteration. This algorithm is described in [1].
Parameters
nelec : int (default 16)
the number of electrode to keep in the final subset.metric : string | dict (default: ‘riemann’)
The type of metric used for centroid and distance estimation. see mean_covariance for the list of supported metric. the metric could be a dict with two keys, mean and distance in order to pass different metric for the centroid estimation and the distance estimation. Typical usecase is to pass ‘logeuclid’ metric for the mean in order to boost the computional speed and ‘riemann’ for the distance in order to keep the good sensitivity for the selection.n_jobs : int, (default: 1)
The number of jobs to use for the computation. This works by computing each of the class centroid in parallel. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used.Attributes
covmeans_ : list
the class centroids.dist_ : list
list of distance at each interaction.References
[1] A. Barachant and S. Bonnet, “Channel selection procedure using riemannian distance for BCI applications,” in 2011 5th International IEEE/EMBS Conference on Neural Engineering (NER), 2011, 348-351
-
fit
(X, y=None, sample_weight=None)[source]¶ Find the optimal subset of electrodes.
Parameters
X : ndarray, shape (n_trials, n_channels, n_channels)
ndarray of SPD matrices.y : ndarray shape (n_trials, 1)
labels corresponding to each trial.sample_weight : None | ndarray shape (n_trials, 1)
the weights of each sample. if None, each sample is treated with equal weights.Returns
self : ElectrodeSelection instance
The ElectrodeSelection instance.
-
src.classification.applePy.channel_selection_without_covariances module¶
Code for channel selection.
src.classification.applePy.classifier module¶
-
class
src.classification.applePy.classifier.
ApplePyClassifier
(used_pipelines=None)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Global class dealing with automated classification. Inherits BaseEstimator and TransformerMix, in order to make it adaptable to other estimators from other libraries. This class contains :
- A pipeline catalogue with all the pipelines and the important information about the parameters to fit
- All the predictions and the prediction probabilities for all pipelines
- All the correct answers
- All the scores, confusion matrices, precisions, recalls, and ROC infos for all matrices
- The dataset and the labels, as well as source dataset
- The group indices
- The eventual test dataset, labels and groups
-
classify
(dataset, dataset_path=None, test_dataset_size=5, cv_value=5, independent_features_selection=False, channels_to_select=20, use_groups=True, tune_hypers=False, classify_test=False)[source]¶ Global classification method of the library. (inter-subjects)
Reads the dataset, can apply independent features selection, can tune hyperparameters, fits, predicts, scores, and shows results.
Parameters
dataset : string; the path to the dataset
test_dataset_size : int; the fraction of the dataset to be considered for testing
pre_epoched : boolean; whether the dataset is already epoched or not
tmin, tmax : time limits for the epochs
bads : list of electrodes to be rejected
picks : list of electrodes to be worked on
filtering : tuple containing the higher and lower frequencies to filter the data
tmin, tmax : tmin, tmax for delimiting the epochs in time
ICA : Boolean, whether to apply Independent Component Analysis or not
resample : boolean, whether to resample the data at 512 Hz or not
baseline : the baseline to be applied to the data
cv_value : int; the number of folds for cross validation. If None, the number of subjects will be used (Leave one out)
independent_features_selection : boolean; whether to apply independent features selection or not
channels_to_select : int or None; the number of channels to be selected or None if automatic number selection
use_groups : boolean; whether to use groups for cross validation or not
tune_hypers : boolean; whether to tune hyperparameters or not
names : list; names of the categories
classify_test : boolean; whether there should be a separate test dataset or not
-
classify_intraSubject
(dataset, divided_dataset=True, nb_subj=None, test_dataset_size=5, pre_epoched=True, tmin=-0.2, tmax=0.5, bads=None, picks=None, filtering=[None, None], ICA=False, resample=False, baseline=None, event_ids=[None, None], reference=None, cv_value=5, independent_features_selection=False, channels_to_select=20, tune_hypers=False, names=[0, 1], classify_test=False, use_all_pipelines=False)[source]¶ Global classification method of the library. (intra-subject)
Reads the dataset, can apply independent features selection, can tune hyperparameters, fits, predicts, scores, and shows results.
Parameters
dataset : string; the path to the dataset
nb_subj : int; number of subjects to consider
test_dataset_size : int; the fraction of the dataset to be considered for testing
pre_epoched : boolean; whether or not the dataset is already epoched
tmin, tmax : time limits for the epochs
bads : list of electrodes to be rejected
picks : list of electrodes to be worked on
filtering : tuple containing the higher and lower frequencies to filter the data
tmin, tmax : tmin, tmax for delimiting the epochs in time
ICA : Boolean, whether or not to apply Independent Component Analysis
resample : boolean, whether or not to resample the data at 512 Hz
baseline : the baseline to be applied to the data
cv_value : int; the number of folds for cross validation. If None, the number of subjects will be used (Leave one out)
independent_features_selection : boolean; whether or not to apply independent features selection
channels_to_select : int or None; the number of channels to be selected or None if automatic number selection
tune_hypers : boolean; whether or not to tune hyperparameters
names : list; names of the categories
classify_test : boolean; whether or not there should be a separate test dataset
use_all_pipelines : boolean; whether or not all the pipelines should be used for classification or only a subset
-
classify_with_CNN
(dataset, nb_subj=None, divided_dataset=True, pre_epoched=True, tmin=0, tmax=0.5, bads=None, picks=None, filtering=[None, None], ICA=False, resample=False, baseline=None, event_ids=[None, None], reference=None, test_size=5)[source]¶ Global classification method of the library. (inter-subjects)
Reads the dataset, can apply independent features selection, can tune hyperparameters, fits, predicts, scores, and shows results.
Parameters
dataset : string; the path to the dataset
nb_subj : int; number of subjects to consider
test_dataset_size : int; the fraction of the dataset to be considered for testing
pre_epoched : boolean; whether or not the dataset is already epoched
tmin, tmax : time limits for the epochs
bads : list of electrodes to be rejected
picks : list of electrodes to be worked on
filtering : tuple containing the higher and lower frequencies to filter the data
tmin, tmax : tmin, tmax for delimiting the epochs in time
ICA : Boolean, whether or not to apply Independent Component Analysis
resample : boolean, whether or not to resample the data at 512 Hz
baseline : the baseline to be applied to the data
cv_value : int; the number of folds for cross validation. If None, the number of subjects will be used (Leave one out)
independent_features_selection : boolean; whether or not to apply independent features selection
channels_to_select : int or None; the number of channels to be selected or None if automatic number selection
use_groups : boolean; whether or not to use groups for cross validation
tune_hypers : boolean; whether or not to tune hyperparameters
names : list; names of the categories
classify_test : boolean; whether or not there should be a separate test dataset
use_all_pipelines : boolean; whether or not all the pipelines should be used for classification or only a subset
-
count_subjects
(directory)[source]¶ Count the number of subjects in the dataset.
Parameters
directory : the path to the dataset
-
create_folder
(k)[source]¶ Creates the k-folds folder for the data.
Parameters
k : int
number of folds
-
delete_pipelines
(pipeline_names)[source]¶ Deletes a pipeline by calling the method from the pipeline catalogue. Additionally, deletes all occurences of the pipeline.
Parameters
name : string
name of the pipeline
-
estimate_covariance_matrices
(dataset)[source]¶ Creates the simple covariance matrices for the raw data.
Result format = nb_subj x nb_epochs x nb_channels x nb_channels
Parameters
dataset : the dataset on which to estimate the covariance matrices.
-
estimate_sources
(raw_dataset, info, tmin_noise, tmax_noise, trans=None, sourceSpaces=None, bemSolution=None, mixedSourceSpaces=None, loose=1, snr=3, fixed=False)[source]¶ Estimates the sources for the dataset using a Sources_estimator object.
Parameters
raw_dataset : the dataset to be estimated
info : info dictionary for the EEG recordings (see Epochs.info)
tmin_noise, tmax_noise : tmin and tmax for delimiting the noise estimation
trans=None : path to the eventual coregistration file
sourceSpaces : path to the eventual source spaces file
bemSolution : path to the eventual bem solution file
mixedSourceSpaces : path to the eventual mixed source spaces file
loose : between 0 and 1. “value that weights the source variances of the dipole components that are parallel (tangential) to the cortical surface”
snr = signal to noise ratio value
fixed : Boolean, whether or not to use fixed source orientations normal to the cortical mantle
-
final_score
()[source]¶ Final scoring function. For each pipeline, a score, confusion matrix, precision, recall, and ROC curve is created.
-
fit
(x_train, y_train)[source]¶ Fits all the pipeline to a provided dataset.
Parameters
x_train : the training samples
y_train : the correct answers to the training samples
-
fit_transform
(x, y)[source]¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters: - X (array-like of shape (n_samples, n_features)) – Input samples.
- y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
- **fit_params (dict) – Additional fit parameters.
Returns: X_new – Transformed array.
Return type: ndarray array of shape (n_samples, n_features_new)
-
independent_features_selection
(use_sources=False, channels_to_select=None, use_groups=True)[source]¶ Applies independent channel or zone selection on the data, without the pipelines. Only the selected channels or zones will be kept on the final data.
Parameters
use_sources : boolean; whether or not to use sources for feature selection
cv : int; number of folds for cross validation
channels_to_select : int; if any, the desired number of features
use_groups : boolean; whether or not to use groups when dividing the data
-
modify_add_pipeline
(name, pipeline, parameters)[source]¶ Modifies or adds a new pipeline to the catalogue.
Parameters
name : string
name of the pipelinepipeline : instance of pipeline
the pipeline to be usedparameters : see doc for pipeline catalogue
the notation for the parameters to fit
-
modify_parameters
(name, parameters)[source]¶ Modifies a pipeline’s parameters by calling the method from the pipeline catalogue.
Parameters
name : string
name of the pipelineparameters : see doc for pipeline catalogue
the notation for the parameters to fit
-
plot_ROC
(nb_lines, nb_columns)[source]¶ Plots the ROC curves for all the pipelines from the catalogue.
Parameters
nb_lines : the number of lines to be used for all the pipelines
nb_columns : the number of columns to be used for all the pipelines
-
plot_confusion
(nb_lines, nb_columns, names)[source]¶ Plots the confusion matrices for all the pipelines from the catalogue.
Parameters
nb_lines : the number of lines to be used for all the pipelines
nb_columns : the number of columns to be used for all the pipelines
names : list; list containing the names of the labels for the categories.
-
predict_proba
(x_test)[source]¶ Predicts with probabilities the values for a dataset.
Parameters
x_test : the test samples
-
predict_test_dataset
()[source]¶ Predicts the left out test dataset, computes the score and shows the results.
-
prepare_nonepoched_dataset
(directory, nb_subj, tmin, tmax, bads=None, picks=None, filtering=[None, None], ICA=False, resample=False, baseline=None, event_ids=[None, None], reference=None)[source]¶ Prepare a subjects x epochs x channels x times dataset from raw files
Parameters
directory : the path to the directory containing the dataset
nb_subj : the number of subjects to be considered
tmin, tmax : tmin, tmax for delimiting the epochs in time
bads : list of electrodes to be rejected
picks : list of electrodes to be worked on
filtering : tuple containing the higher and lower frequencies to filter the data
tmin, tmax : tmin, tmax for delimiting the epochs in time
ICA : Boolean, whether or not to apply Independent Component Analysis
resample : boolean, whether or not to resample the data at 512 Hz
baseline : the baseline to be applied to the data
event_ids : The id of the event to consider
reference : the name of the reference to be applied to the data
-
prepare_nonepoched_dataset_oneFilePerSubj
(directory, nb_subj, tmin, tmax, bads=None, picks=None, filtering=[None, None], ICA=False, resample=False, baseline=None, event_ids=[None, None])[source]¶ Prepare a subjects x epochs x channels x times dataset from raw files
Parameters
directory : the path to the directory containing the dataset
nb_subj : the number of subjects to be considered
tmin, tmax : tmin, tmax for delimiting the epochs in time
bads : list of electrodes to be rejected
picks : list of electrodes to be worked on
filtering : tuple containing the higher and lower frequencies to filter the data
tmin, tmax : tmin, tmax for delimiting the epochs in time
ICA : Boolean, whether or not to apply Independent Component Analysis
resample : boolean, whether or not to resample the data at 512 Hz
baseline : the baseline to be applied to the data
event_ids : The id of the event to consider
reference : the name of the reference to be applied to the data
-
read_all_files
(directory, nb_subj=None, divided_dataset=True, tmin=0, tmax=0.5, bads=None, picks=None, filtering=[None, None], pre_epoched=True, ICA=False, resample=False, baseline=None, event_ids=[None, None], reference=None)[source]¶ Create a subjects x epochs x channels x times dataset from epoched files.
Parameters
directory : the path to the directory containing the dataset
nb_subj : the number of subjects to be considered
divided_dataset : boolean; whether or not the dataset is divided by classes
tmin, tmax : tmin, tmax for delimiting the epochs in time
bads : list of electrodes to be rejected
picks : list of electrodes to be worked on
filtering : tuple containing the higher and lower frequencies to filter the data
pre_epoched : boolean; whether or not the dataset has been pre-epoched
tmin, tmax : tmin, tmax for delimiting the epochs in time
ICA : Boolean, whether or not to apply Independent Component Analysis
resample : boolean, whether or not to resample the data at 512 Hz
baseline : the baseline to be applied to the data
event_ids : The id of the event to consider
reference : the name of the reference to be applied to the data
-
read_one_file
(file_path, file_name, destination, bads=None, picks=None, filtering=(1, 45), tmin=0, tmax=0.5, ICA=False, resample=False, baseline=None, event_ids=None, reference=None)[source]¶ Reads one non pre-epoched (raw) set file and saves the result in a -epo.fif file.
Parameters
file_path : path to the file to be opened
file_name : name of the file
destination : path where the -epo.fif file will be saved
bads : list of electrodes to be rejected
picks : list of electrodes to be worked on
filtering : tuple containing the higher and lower frequencies to filter the data
tmin, tmax : tmin, tmax for delimiting the epochs in time
ICA : Boolean, whether or not to apply Independent Component Analysis
resample : boolean, whether or not to resample the data at 512 Hz
baseline : the baseline to be applied to the data
event_ids : The id of the event to consider
reference : the name of the reference to be applied to the data
-
read_preepoched_oneFilePerSubj
(directory, nb_subj, tmin, tmax, bads=None, picks=None, filtering=[None, None], ICA=False, resample=False, baseline=None, event_ids=[None, None], reference=None)[source]¶ Create a subjects x epochs x channels x times dataset from epoched files.
Parameters
directory : the path to the directory containing the dataset
nb_subj : the number of subjects to be considered
tmin, tmax : tmin, tmax for delimiting the epochs in time
bads : list of electrodes to be rejected
picks : list of electrodes to be worked on
filtering : tuple containing the higher and lower frequencies to filter the data
tmin, tmax : tmin, tmax for delimiting the epochs in time
ICA : Boolean, whether or not to apply Independent Component Analysis
resample : boolean, whether or not to resample the data at 512 Hz
baseline : the baseline to be applied to the data
event_ids : The id of the event to consider
reference : the name of the reference to be applied to the data
-
restore
()[source]¶ Restores the classifier by restoring the predictions, predictions probabilities, expected answers, scores, confusion matrices, precisions, recalls and roc info
-
save_program_log
(path)[source]¶ Saves the program log in a file.
Parameters
path : string; path to the place where the file will be saved.
-
score
(x, y)[source]¶ Return the accuracy on the given test data and labels.
Parameters
x : test samples
y : correct answers for x
weights : Sample weights
-
score_all_pipelines
(x, y)[source]¶ Return the accuracy on the given predictions and labels for all trained pipelines.
Parameters
x : test samples
y : correct answers for x
-
score_func
(estimator, x_test, y_test)[source]¶ Scoing function for the independent feature selection. The previously trained pipeline is used to predict the test dataset. At the end, the predicted data are compared to the correct answers, and the percentage of correctly classified data is considered the score.
Parameters
estimator : the estimator that predicts and scores
x_test : the test dataset
y_test : the correct answers to the dataset
-
score_func_independent_feat_selection
(estimator, x_test, y_test)[source]¶ Scoing function for the independent feature selection. The previously trained pipeline is used to predict the test dataset.
At the end, the predicted data are compared to the correct answers, and the percentage of correctly classified data is considered the score.
Parameters
estimator : the estimator that predicts and scores
x_test : the test dataset
y_test : the correct answers to the dataset
-
show_results
(nb_columns)[source]¶ Shows the ROC curves and the confusion matrices for all the pipelines.
Parameters
nb_columns : the number of columns to be used for all the pipelines
names : list; list containing the names of the labels for the categories.
-
tune_hyperparameters
(cv, use_sources=False, use_groups=True, factor=None)[source]¶ Tune the hyperparameters for the different pipelines and replace the pipelines by their improved versions.
Parameters
cv : cross validation for tuning
use_sources : boolean; whether or not to use the sources dataset
use_groups : boolean; whether or not to use groups for the cross validation
factor : int; the stair by which to augment the number of filters or electrodes
src.classification.applePy.cnn module¶
src.classification.applePy.pipeline_catalogue module¶
-
class
src.classification.applePy.pipeline_catalogue.
Pipeline_catalogue
(used_pipeline=None, channels_selected=False)[source]¶ Bases:
object
Pipeline catalogue is mainly composed of the catalogue and the parameters to fit. The catalogue contains 12 pre-made pipelines, and the parameters to fit contains, for each pipeline, the different parameters that should be tested in order to obtain better results.
-
delete_pipeline
(name)[source]¶ Deletes a pipeline and the pipeline’s parameters to fit. :param name: name of the pipeline :type name: string
-
modify_add_pipeline
(name, pipeline, parameters)[source]¶ Allows the user to add their own pipeline to the available catalogue, or to modify a pipeline (for example, using their own hyper-values for a pipeline, such as the number of electrodes to select, of CSP filters, of xDAWN filters, etc…). Code : 0 = no need to tune
1 = Grid Search 2 = Randomized Searchif grid search : parameters = [1, [,,,]] if randomized search : parameters = [2, [,,,], nb_iter*]
Parameters: - name (string) – name of the pipeline
- pipeline (instance of pipeline) – the pipeline to be used
- parameters (see above) – the notation for the parameters to fit
-
src.classification.applePy.saver module¶
src.classification.applePy.sources_estimator module¶
-
class
src.classification.applePy.sources_estimator.
Sources_estimator
(raw_dataset, info, tmin_noise, tmax_noise, trans=None, sourceSpaces=None, bemSolution=None, mixedSourceSpaces=None, coregistration=None, loose=1, snr=3, fixed=False)[source]¶ Bases:
object
-
apply_common_average
()[source]¶ Applies a common average reference if not already applied. Raw_dataset format must be : subjects x conditions x epochsEEGLAB Raw_dataset must be an array.
-
compute_noise_covariances
()[source]¶ Computes the noise covariances for the raw data. Raw_dataset format must be : subjects x conditions x epochsEEGLAB The limits of the noise must be provided.
-
create_inverse_operator
(forwardSolution, noise_covariances, depth, fixed)[source]¶ Calls the inverse operator method from mne with the chosen arguments. Returns the inverse operator.
-
estimate_sources
()[source]¶ Call all the above methods in order to estimate the sources on the dataset.
-
src.classification.applePy.tools module¶
-
class
src.classification.applePy.tools.
CospBoostingClassifier
(baseclf)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Co-spectral matrix bagging. Source from Cédric Simar.
-
class
src.classification.applePy.tools.
DownSampler
(factor=4)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Downsample transformer. Source from Cédric Simar.
-
class
src.classification.applePy.tools.
EpochsVectorizer
[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Vectorize epochs. Source from Cédric Simar.
-
class
src.classification.applePy.tools.
PSDfiltering
(frequencies=array([[ 1, 4], [ 4, 8], [ 8, 15], [15, 20], [30, 40]]), sampling_freq=512, overlap=0.25)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Power Spectral Density class. Code inspired from Cédric Simar.