Source code for handwriting_features.interface.featurizer
from handwriting_features.interface.featurizer.handlers import MultiSubjectFeatureExtractorHandler
[docs]
class FeatureExtractor(object):
"""
Class implementing the features extractor interface for the Featurizer API.
For more information about featurizer, see the following repositories:
1. [server side](#github.com/BDALab/featurizer-api)
1. [client side](#github.com/BDALab/featurizer-api-client)
For more information about the attributes, see: ``extract(...)``
"""
def __init__(self, values, labels=None, **configuration):
"""
Initializes the FeatureExtractor featurizer API interface.
:param values: data values to extract the features from
:type values: numpy.ndarray
:param labels: data labels for data samples, defaults to None
:type labels: list, optional
:param configuration: common extractor configuration
:type configuration: **kwargs, optional
"""
# Set the sample values/labels
self.values = values
self.labels = labels if labels else []
# Set the extractor configuration
self.configuration = configuration if configuration else {}
# Initialize the handler
self.handler = MultiSubjectFeatureExtractorHandler
[docs]
def extract(self, pipeline):
"""
Interface method: extract the features.
**Data**
1. data is of type: ``numpy.ndarray``.
2. data is mandatory.
3. data shape: In general, data to have the shape (M, ..., D). Where M
stands for subjects (i.e. subjects are in the first dimension), and
D stands for D data samples (of shape ...).
1. in the case of data having the following shape: (D, ), the API
assumes it is a vector of D data sample points for one subject.
It transforms the data to a row vector: (1, D) to add the
dimension for the subject.
2: in the case of data having the following shape: (M, ..., D),
the API does not transform the data, but it assumes there are
M subjects abd D data samples, each having (...) dimensionality,
e.g. if data has the shape (M, 3, 10) it means that there are
M subjects and each of the subjects has 10 data samples (each
being three dimensional).
**Labels**
1. labels are of type: ``list``.
2. labels are optional.
3. labels are of length D (for each data sample, there is one label)
**Configuration**
1. configuration are of type: ``dict``.
2. configuration is optional.
3. configuration provides common kwargs for feature extraction
**Pipeline**
1. pipeline is of type: ``list``.
2. pipeline is mandatory.
3. each element in the pipeline is of type: ``dict``.
4. each element in the pipeline has the following keys: a) ``name``
to hold the name of the feature to be computed, and b) ``args``
to hold the arguments (kwargs) for the specific feature extraction
method that is going to be used (it is of type: ``dict``).
**Output**
The extracted features follow the same shape convention as the input
data: the subjects are in the first dimension, and the features are
in the last dimension (each feature having shape ...).
:param pipeline: pipeline of the features to be extracted
:type pipeline: list
:return: extracted features and labels
:rtype: dict {"features": ..., "labels": ...}
"""
return self.handler.extract(self.values, self.labels, pipeline, **self.configuration)