pylelemmatize.GenericLemmatizer

class pylelemmatize.GenericLemmatizer(mapping_dict={}, unknown_chr='�', unicode_normalization='Dense')[source]

Bases: AbstractLemmatizer

Parameters:
  • unknown_chr (str)

  • unicode_normalization (Literal['Dense', 'Composed', None])

__init__(mapping_dict={}, unknown_chr='�', unicode_normalization='Dense')[source]
Parameters:
  • unknown_chr (str)

  • unicode_normalization (Literal['Dense', 'Composed', None])

Methods

__init__([mapping_dict, unknown_chr, ...])

copy_removing_unused_inputs(txt)

fast_alphabet_extraction(text)

from_alphabet_mapping(src_alphabet_str[, ...])

get_cer(pred, true)

get_encoding_information_loss(text)

get_unigram(text)

len()

Return the size of the destination alphabet.

Attributes

alphabet_tsv

dst_alphabet_str

mapping_tsv

src_alphabet_str

unicode_normalization

unknown_chr

classmethod from_alphabet_mapping(src_alphabet_str, dst_alphabet_str=None, unknown_chr='�', override_map=None, min_similarity=0.25, verbose=0)[source]
Return type:

GenericLemmatizer

Parameters:
  • src_alphabet_str (str)

  • dst_alphabet_str (str | None)

  • unknown_chr (str)

  • override_map (Dict[str, str] | None)

  • min_similarity (float)

  • verbose (int)

copy_removing_unused_inputs(txt)[source]
Return type:

Any

Parameters:

txt (str)

len()[source]

Return the size of the destination alphabet.

Return type:

int

__call__(text)[source]

Convert text to the alphabet representation.

Return type:

str

Parameters:

text (str)

property src_alphabet_str: str
property dst_alphabet_str: str