pylelemmatize.Seq2SeqDs
- class pylelemmatize.Seq2SeqDs(text_blocks, input_mapper=None, output_mapper=None, min_input_seqlen=50, min_output_seqlen=50, one2one_mapping=None, crop_to_seqlen=None, input_is_onehot=False, output_is_onehot=False)[source]
Bases:
object- Parameters:
text_blocks (Tuple[List[str], List[str]])
input_mapper (LemmatizerBMP | None)
output_mapper (LemmatizerBMP | None)
min_input_seqlen (int)
min_output_seqlen (int)
one2one_mapping (bool | None)
crop_to_seqlen (int | None)
input_is_onehot (bool)
output_is_onehot (bool)
- __init__(text_blocks, input_mapper=None, output_mapper=None, min_input_seqlen=50, min_output_seqlen=50, one2one_mapping=None, crop_to_seqlen=None, input_is_onehot=False, output_is_onehot=False)[source]
- Parameters:
text_blocks (Tuple[List[str], List[str]])
input_mapper (LemmatizerBMP | None)
output_mapper (LemmatizerBMP | None)
min_input_seqlen (int)
min_output_seqlen (int)
one2one_mapping (bool | None)
crop_to_seqlen (int | None)
input_is_onehot (bool)
output_is_onehot (bool)
Methods
__init__(text_blocks[, input_mapper, ...])compute_ds_CER([use_editdistance])Compute the Character Error Rate (CER) of the dataset.
create_selfsupervised_ds(corpus, mapper[, ...])from_parallel_txt_corpus(input_glob, ...)load_parallel_txt_corpus(input_glob, output_glob)render_sample([n, include_alphabet])shuffle()split([train_ratio, shuffle])- static load_icdar2019_parallel_txt_corpus(input_paths, max_insertions, min_length, max_length)[source]
- Return type:
List[Tuple[List[str],List[str]]]- Parameters:
input_paths (str | List[str])
max_insertions (int)
min_length (int)
max_length (int)
- static load_parallel_txt_corpus(input_glob, output_glob, check_integrity='cleanup')[source]
- Return type:
List[Tuple[List[str],List[str]]]- Parameters:
input_glob (str | List[str])
output_glob (str | List[str])
check_integrity (Literal['cleanup', 'raise', 'ignore'])
- static from_parallel_txt_corpus(input_glob, output_glob, **kwargs)[source]
- Return type:
- Parameters:
input_glob (str | List[str])
output_glob (str | List[str])
- static create_selfsupervised_ds(corpus, mapper, mapped_is_input=True, add_all_occuring_to_input=True, **kwargs)[source]
- Return type:
- Parameters:
corpus (List[str])
mapper (LemmatizerBMP)
mapped_is_input (bool)
add_all_occuring_to_input (bool)