Preprocessing¶
Normalize¶
- naplib.preprocessing.normalize(data=None, field='resp', axis=0, method='zscore', nan_policy='propagate')[source]¶
Normalize data over multiple trials. If you are trying to normalize a single matrix, it must be passed in inside a list, or with an extra first dimension.
- Parameters:
data (naplib.Data object, optional) -- Data object containing data to be normalized in one of the field. If not given, then the the data to be normalized must be passed directly as a list of trial arrays to the
fieldargument instead of a string.field (string | list of np.ndarrays or a multidimensional np.ndarray) -- Field to normalize. If a string, it must specify one of the fields of the Data provided in the first argument. If a multidimensional array, first dimension indicates the trial/instances which will be concatenated over to compute normalization statistics. If a list, all arrays will be concatenated over
axis.axis (int or None, default=0) -- Axis of the array to concatenate over before normalizing over this same axis. If None, computes statistics over all dimensions jointly, thus normalizing over the global statistics.
method (string, default='zscore') -- Method of normalization. Must be one of ['zscore','center']. 'center' only centers the data, while 'zscore' also scales by standard deviation
nan_policy (string, default='propagate') -- One of {'propagate','omit','raise'}. Defines how to handle when input contains nan. 'propagate' returns nan, 'raise' throws an error, 'omit' performs the calculations ignoring nan values. Default is 'propagate'. Note that when the value is 'omit', nans in the input also propagate to the output, but they do not affect the z-scores computed for the non-nan values.
- Returns:
normalized_data
- Return type:
list of np.ndarrays
Examples using ``normalize ``¶
Rereference Data¶
- naplib.preprocessing.rereference(arr, data=None, field='resp', method='avg', return_reference=False)[source]¶
Rereference responses based on the specification of a connection array defining which electrodes should be used to define the "reference" for each electrode.
- Parameters:
arr (np.ndarray) -- Square matrix defining connections between electrodes and their groupings. Arr should have dtype float, but can be entirely 1s and 0s or can encode weights with intermediate values. This can be created by one of the helper functions, like
make_contact_rereference_arr.data (naplib.Data object, optional) -- Data object containing data to be normalized in one of the field. If not given, then the the data to be normalized must be passed directly as a list of trial arrays to the
fieldargument instead of a string.field (string | list of np.ndarrays or a multidimensional np.ndarray, default='resp') -- Field to normalize. If a string, it must specify one of the fields of the Data provided in the first argument. If a multidimensional array, first dimension indicates the trial/instances which will be concatenated over to compute normalization statistics. If a list, each array must be a multidimensional array of shape (time_i, channels)
method (string, default='avg') -- Method for computing the reference over a group of electrodes. Options are 'avg' (average), 'med' (median), or 'pca' (first principle component). Note, PCA method will whiten responses first.
return_reference (bool, default=False) -- If True, also return the reference computed for each electrode, which will be a list of numpy arrays, just like the rereferenced_data. So rereferenced_data[i]+reference[i] will reproduce field[i].
- Returns:
rereferenced_data (list of np.ndarrays) -- Re-referenced data.
reference (list of np.ndarrays) -- Reference for each electrode. Only returned if
return_reference=True
See also
Make Rereference Array for Contacts¶
- naplib.preprocessing.make_contact_rereference_arr(channelnames, extent=None, grid_sizes={})[source]¶
Create grid which defines re-referencing scheme based on electrodes being on the same contact as each other.
- Parameters:
channelnames (list or array-like) -- Channelname of each electrode. They must follow the following scheme: 1) All channelnames must be be alphanumeric, with any numbers only being on the right. 2) The numeric portion specifies a different electrode number, while the character portion in the left of the channelname specifies the contact name. E.g. ['RT1','RT2','RT3','Ls1','Ls2'] indicates two contacts, the first with 3 electrodes and the second with 2 electrodes.
extent (int, optional, default=None) -- If provided, then only contacts from the same group which are within
extentelectrodes away from each other (inclusive) are still grouped together. For example, ifextent=1, only the nearest electrode on either side of a given electrode on the same contact is still grouped with it. Thisextent=1produces the traditional local average reference scheme. The defaultextent=Noneproduces the traditional common average reference scheme.grid_sizes (dict, optional, default={}) -- If provided, contains {'contact_name': (nrow, ncol)} values for any known ECoG grid sizes. E.g. {'GridA': (8, 16)} indicates that electrodes on contact 'GridA' are arranged in an 8 x 16 grid, which is needed to determine adjacent electrodes for local average referencing with
extent >= 1.
- Returns:
arr -- Square matrix of rereference connections.
- Return type:
np.ndarray
See also
Filter Line Noise¶
- naplib.preprocessing.filter_line_noise(data=None, field='resp', fs='dataf', f=60, num_taps=501, num_repeats=1, in_place=False)[source]¶
Filter input data with a notch filter to remove line noise and its harmonics. A notch FIR filter is applied at the line noise frequency and all of its harmonics up to the Nyquist rate.
- Parameters:
data (naplib.Data instance, optional) -- Data object containing data to be filtered in one of the fields. If None, then must give the signals to be filtered in the 'field' argument.
field (string | list of np.ndarrays or a multidimensional np.ndarray) -- Field of trials to filter. If a string, it must specify one of the fields of the Data provided in the first argument. If a multidimensional array, first dimension indicates the trial/instances which will be concatenated over to compute normalization statistics. Each trial is filtered independently.
fs (string | int, default='dataf') -- Sampling rate of the data. Either a string specifying a field of the Data or an int giving the sampling rate for all trials.
f (float, default=60) -- Line noise frequency, in Hz.
num_taps (int, default=501) -- Number of taps in the FIR filter. Must be odd.
num_repeats (int, default=1) -- Number of times to repeat convolving the filter. This is useful to increase if the number of taps is low compared to the sampling rate (e.g. less than 1 full second).
in_place (bool, default=False) -- Whether to filter the data in-place.
- Returns:
filtered_data
- Return type:
list of np.ndarrays
Phase Amplitude Extract¶
- naplib.preprocessing.phase_amplitude_extract(data=None, field='resp', fs='dataf', Wn=[[70, 150]], bandnames=None, fs_out=None, n_jobs=1)[source]¶
Extract phase and amplitude (envelope) from a frequency band or a set of frequency bands all at once. Each band is computed by averaging over the envelopes and phases computed from the Hilbert Transform of each filter output in a filterbank of bandpass filters [1].
- Parameters:
data (naplib.Data instance, optional) -- Data object containing data to be filtered in one of the fields. If None, then must give the signals to be filtered in the 'field' argument.
field (string | list of np.ndarrays or a multidimensional np.ndarray) -- Field of trials to filter. If a string, it must specify one of the fields of the Data provided in the first argument. If a multidimensional array, first dimension indicates the trial/instances which will be concatenated over to compute normalization statistics. Each trial is filtered independently.
fs (string | int, default='dataf') -- Sampling rate of the data. Either a string specifying a field of the Data or an int giving the sampling rate for all trials.
Wn (list or array-like, shape (n_freq_bands, 2) or (2,), default=[[70, 150]]) -- Lower and upper boundaries for filterbank center frequencies. The default of [[70, 150]] extracts the phase and amplitude of the highgamma band.
bandnames (list of strings, length=n_freq_bands, optional) -- If provided, these are used to create the field names for each frequency band's amplitude and phase in the output Data. Should be the same length as the number of bands specified in Wn. For example, if Wn=[[8,12],[70,150]], and bandnames=['alpha','highgamma'], then the fields of the output Data will be {'alpha phase', 'alpha amp', 'highgamma phase', 'highgamma amp'}. But if bandnames=None, then they will be {'[ 8 12] phase', '[ 8 12] amp', '[ 70 150] phase', '[ 70 150] amp'}.
fs_out (int, default=None) -- If not None, each output phase and amplitude will be resampled to this sampling rate.
n_jobs (int, default=1) -- Number of jobs to use to compute filterbank across channels in parallel in filterbank_hilbert. Using n_jobs != 1 is memory intensive, so it will not necessarily improve performance if working with a large dataset.
- Returns:
phase_amplitude_data -- An instance of naplib.Data containing 2*n_freq_bands fields. For each frequency band, there will be a field for phase and a field for amplitude of that band, each with shape (time, channels) for each trial.
- Return type:
naplib.Data instance
See also
References
Filter Hilbert¶
- naplib.preprocessing.filter_hilbert(x, fs, Wn=[[70, 150]], n_jobs=1)[source]¶
Compute the phase and amplitude (envelope) of a signal over multiple frequency bands, as in [1]. This is done using a filter bank of gaussian shaped filters with center frequencies linearly spaced until 4Hz and then logarithmically spaced. The Hilbert Transform of each filter's output is computed and the amplitude and phase are computed from the complex values. Then amplitude and phase are averaged for each channel over the center frequencies. See [1] for details on the filter bank used.
See also
- Parameters:
x (np.ndarray, shape (time, channels)) -- Signal to filter. Filtering is performed on each channel independently.
fs (int) -- Sampling rate.
Wn (list or array-like, shape (n_freq_bands, 2) or (2,), default=[[70, 150]]) -- Lower and upper boundaries for filterbank center frequencies. The default of [[70, 150]] extracts the phase and amplitude of the highgamma band.
n_jobs (int, default=1) -- Number of jobs to use to compute filterbank across channels in parallel.
- Returns:
x_phase (np.ndarray, shape (time, channels, frequency_bins)) -- Phase of each frequency bin in the filter bank for each channel.
x_envelope (np.ndarray, shape (time, channels, frequency_bins)) -- Envelope of each frequency bin in the filter bank for each channel.
center_freqs (np.ndarray, shape (frequency_bins,)) -- Center frequencies for each frequency bin used in the filter bank.
Examples
>>> import naplib as nl >>> from naplib.preprocessing import filter_hilbert as f_hilb >>> import numpy as np >>> x = np.random.rand(1000,3) # 3 channels of signals >>> fs = 500 >>> x_phase, x_envelope, freqs = f_hilb(x, fs, Wn=[[1, 50], [70, 150]]) >>> # the outputs have the phase and envelope for each channel and each frequency band >>> x_phase.shape # 3rd dimension is one for each frequency band in Wn (1000, 3, 2) >>> x_envelope.shape (1000, 3, 2) >>> freqs[0] # center frequency of first filter bank filter 1.21558792 >>> freqs[-1] # center frequency of last filter bank filter 143.97075186
Filterbank Hilbert¶
- naplib.preprocessing.filterbank_hilbert(x, fs, Wn=[70, 150], n_jobs=1)[source]¶
Compute the phase and amplitude (envelope) of a signal for a single frequency band, as in [1]. This is done using a filter bank of gaussian shaped filters with center frequencies linearly spaced until 4Hz and then logarithmically spaced. The Hilbert Transform of each filter's output is computed and the amplitude and phase are computed from the complex values. See [1] for details on the filter bank used.
See also
- Parameters:
x (np.ndarray, shape (time, channels)) -- Signal to filter. Filtering is performed on each channel independently.
fs (int) -- Sampling rate.
Wn (list or array-like, length 2, default=[70, 150]) -- Lower and upper boundaries for filterbank center frequencies. A range of [1, 150] results in 42 filters.
n_jobs (int, default=1) -- Number of jobs to use to compute filterbank across channels in parallel.
- Returns:
x_phase (np.ndarray, shape (time, channels, frequency_bins)) -- Phase of each frequency bin in the filter bank for each channel.
x_envelope (np.ndarray, shape (time, channels, frequency_bins)) -- Envelope of each frequency bin in the filter bank for each channel.
center_freqs (np.ndarray, shape (frequency_bins,)) -- Center frequencies for each frequency bin used in the filter bank.
Examples
>>> import naplib as nl >>> from naplib.preprocessing import filterbank_hilbert as fb_hilb >>> import numpy as np >>> x = np.random.rand(1000,3) # 3 channels of signals >>> fs = 500 >>> x_phase, x_envelope, freqs = fb_hilb(x, fs, Wn=[1, 150]) >>> # the outputs have the phase and envelope for each channel and each filter in the filterbank >>> x_phase.shape # 3rd dimension is one for each filter in filterbank (1000, 3, 42) >>> x_envelope.shape (1000, 3, 42) >>> freqs[0] # center frequency of first filter bank filter 1.21558792 >>> freqs[-1] # center frequency of last filter bank filter 143.97075186
Butterworth Filter¶
- naplib.preprocessing.filter_butter(data=None, field='resp', btype='bandpass', Wn=[70, 150], fs='dataf', order=2, return_filters=False, in_place=False)[source]¶
Filter time series signals using an Nth order digital Butterworth filter. The filter is applied to each column of each trial in the field data.
- Parameters:
data (naplib.Data instance, optional) -- Data object containing data to be normalized in one of the field. If not given, must give the data to be normalized directly as the
dataargument.field (string | list of np.ndarrays or a multidimensional np.ndarray) -- Field to bandpass filter. If a string, it must specify one of the fields of the Data provided in the first argument. If a multidimensional array, first dimension indicates the trial/instances. Each trial's data must be of shape (time, channels)
Wn (float, list or array-like, default=[70,150]) -- Critical frequencies, in Hz. The critical frequency or frequencies. For lowpass and highpass filters, Wn is a scalar; for bandpass and bandstop filters, Wn is a length-2 sequence.
btype (string, defualt='bandpass) -- Filter type, one of {‘lowpass’, ‘highpass’, ‘bandpass’, ‘bandstop’}
fs (string | float) -- Sampling rate of the field to filter. If a string, must specify a field of the Data object. Can be a single float if all trial's have the same sampling rate, or can be a list of floats specifying the sampling rate for each trial.
order (int) -- The order of the filter.
return_filters (bool, default=False) -- If True, return the filter transfer function coefficients from each trial's filtering.
in_place (bool, default=False) -- Whether to filter the data in-place.
- Returns:
filtered_data (list of np.ndarrays) -- Filtered time series.
filter (list) -- Filter transfer function coefficients returned as a list of (b, a) tuples. Only returned if
return_filtersis True.