Source code for handwriting_features.interface.featurizer

from handwriting_features.interface.featurizer.handlers import MultiSubjectFeatureExtractorHandler


[docs] class FeatureExtractor(object): """ Class implementing the features extractor interface for the Featurizer API. For more information about featurizer, see the following repositories: 1. [server side](#github.com/BDALab/featurizer-api) 1. [client side](#github.com/BDALab/featurizer-api-client) For more information about the attributes, see: ``extract(...)`` """ def __init__(self, values, labels=None, **configuration): """ Initializes the FeatureExtractor featurizer API interface. :param values: data values to extract the features from :type values: numpy.ndarray :param labels: data labels for data samples, defaults to None :type labels: list, optional :param configuration: common extractor configuration :type configuration: **kwargs, optional """ # Set the sample values/labels self.values = values self.labels = labels if labels else [] # Set the extractor configuration self.configuration = configuration if configuration else {} # Initialize the handler self.handler = MultiSubjectFeatureExtractorHandler
[docs] def extract(self, pipeline): """ Interface method: extract the features. **Data** 1. data is of type: ``numpy.ndarray``. 2. data is mandatory. 3. data shape: In general, data to have the shape (M, ..., D). Where M stands for subjects (i.e. subjects are in the first dimension), and D stands for D data samples (of shape ...). 1. in the case of data having the following shape: (D, ), the API assumes it is a vector of D data sample points for one subject. It transforms the data to a row vector: (1, D) to add the dimension for the subject. 2: in the case of data having the following shape: (M, ..., D), the API does not transform the data, but it assumes there are M subjects abd D data samples, each having (...) dimensionality, e.g. if data has the shape (M, 3, 10) it means that there are M subjects and each of the subjects has 10 data samples (each being three dimensional). **Labels** 1. labels are of type: ``list``. 2. labels are optional. 3. labels are of length D (for each data sample, there is one label) **Configuration** 1. configuration are of type: ``dict``. 2. configuration is optional. 3. configuration provides common kwargs for feature extraction **Pipeline** 1. pipeline is of type: ``list``. 2. pipeline is mandatory. 3. each element in the pipeline is of type: ``dict``. 4. each element in the pipeline has the following keys: a) ``name`` to hold the name of the feature to be computed, and b) ``args`` to hold the arguments (kwargs) for the specific feature extraction method that is going to be used (it is of type: ``dict``). **Output** The extracted features follow the same shape convention as the input data: the subjects are in the first dimension, and the features are in the last dimension (each feature having shape ...). :param pipeline: pipeline of the features to be extracted :type pipeline: list :return: extracted features and labels :rtype: dict {"features": ..., "labels": ...} """ return self.handler.extract(self.values, self.labels, pipeline, **self.configuration)