utsuho.converters module

Converters for deterministic Japanese text normalization.

class utsuho.converters.FullToHalfConverter(config: Optional[WidthConverterConfig] = None)

ベースクラス: object

Full-width katakana to half-width katakana converter.

パラメータ:

config (WidthConverterConfig, optional) -- Additional configuration of whether to convert non-katakana letters.

convert(s: str) str

Convert full-width katakana to half-width katakana.

パラメータ:

s (str) -- String containing characters to convert to half-width katakana.

戻り値:

String after conversion.

戻り値の型:

str

class utsuho.converters.HalfToFullConverter(config: Optional[WidthConverterConfig] = None)

ベースクラス: object

Half-width katakana to full-width katakana converter.

パラメータ:

config (WidthConverterConfig, optional) -- Additional configuration of whether to convert non-katakana letters.

convert(s: str) str

Convert half-width katakana to full-width katakana.

パラメータ:

s (str) -- String containing characters to convert to full-width katakana.

戻り値:

String after conversion.

戻り値の型:

str

class utsuho.converters.HiraganaToKatakanaConverter

ベースクラス: object

Hiragana to katakana converter.

convert(s: str) str

Convert hiragana to katakana.

パラメータ:

s (str) -- String containing characters to convert to katakana.

戻り値:

String after conversion.

戻り値の型:

str

class utsuho.converters.KatakanaToHiraganaConverter

ベースクラス: object

Katakana to hiragana converter.

convert(s: str) str

Convert katakana to hiragana.

パラメータ:

s (str) -- String containing characters to convert to hiragana.

戻り値:

String after conversion.

戻り値の型:

str

class utsuho.converters.WidthConverterConfig(punctuation: bool = True, corner_brucket: bool = True, conjunction_mark: bool = True, length_mark: bool = True, space: bool = True, ascii_symbol: bool = True, ascii_alphabet: bool = True, ascii_digit: bool = True, wave_dash: bool = False)

ベースクラス: object

Configuration for converting non-katakana characters.

パラメータ:
  • punctuation (bool, default=True) -- Whether to convert punctuation marks.

  • corner_brucket (bool, default=True) -- Whether to convert corner brackets.

  • conjunction_mark (bool, default=True) -- Whether to convert conjunction marks.

  • length_mark (bool, default=True) -- Whether to convert length marks.

  • space (bool, default=True) -- Whether to convert spaces.

  • ascii_symbol (bool, default=True) -- Whether to convert ASCII symbols.

  • ascii_alphabet (bool, default=True) -- Whether to convert ASCII alphabets.

  • ascii_digit (bool, default=True) -- Whether to convert ASCII digits.

  • wave_dash (bool, default=False) -- Whether to convert full-width wave dash to half-width tilde.

ascii_alphabet: bool

Whether to convert ASCII alphabets.

ascii_digit: bool

Whether to convert ASCII digits.

ascii_symbol: bool

Whether to convert ASCII symbols.

conjunction_mark: bool

Whether to convert conjunction marks.

corner_brucket: bool

Whether to convert corner brackets.

length_mark: bool

Whether to convert length marks.

punctuation: bool

Whether to convert punctuation marks.

space: bool

Whether to convert spaces.

wave_dash: bool

Whether to convert full-width wave dashes to half-width tildes.