pylelemmatize.GenericLemmatizer
- class pylelemmatize.GenericLemmatizer(mapping_dict={}, unknown_chr='�', unicode_normalization='Dense')[source]
Bases:
AbstractLemmatizer- Parameters:
unknown_chr (str)
unicode_normalization (Literal['Dense', 'Composed', None])
- __init__(mapping_dict={}, unknown_chr='�', unicode_normalization='Dense')[source]
- Parameters:
unknown_chr (str)
unicode_normalization (Literal['Dense', 'Composed', None])
Methods
__init__([mapping_dict, unknown_chr, ...])fast_alphabet_extraction(text)from_alphabet_mapping(src_alphabet_str[, ...])get_cer(pred, true)get_encoding_information_loss(text)get_unigram(text)len()Return the size of the destination alphabet.
Attributes
alphabet_tsvmapping_tsvunicode_normalizationunknown_chr- classmethod from_alphabet_mapping(src_alphabet_str, dst_alphabet_str=None, unknown_chr='�', override_map=None, min_similarity=0.25, verbose=0)[source]
- Return type:
- Parameters:
src_alphabet_str (str)
dst_alphabet_str (str | None)
unknown_chr (str)
override_map (Dict[str, str] | None)
min_similarity (float)
verbose (int)
- __call__(text)[source]
Convert text to the alphabet representation.
- Return type:
str- Parameters:
text (str)
- property src_alphabet_str: str
- property dst_alphabet_str: str