Roadmap
This roadmap outlines the planned direction of the Utsuho project.
Utsuho aims to remain a lightweight, deterministic, pure-Python library for Japanese text normalization.
The project focuses on:
deterministic character-level transformations
minimal dependencies
predictable behavior suitable for preprocessing and search normalization
Utsuho intentionally avoids dictionary-based processing and heavy linguistic frameworks.
Near Term
MCP Interface
Build on the newly added Model Context Protocol (MCP) interface so that Utsuho can be used more effectively from AI systems and developer tools.
Delivered in 2.2.0:
MCP server implementation
four conversion tools:
half_to_fullfull_to_halfhiragana_to_katakanakatakana_to_hiragana
optional dependency (
utsuho[mcp])stdio transport support
width-conversion options exposed for MCP clients
Near-term improvements:
richer MCP usage examples and client integration guides
clearer compatibility guidance for MCP hosts and agent environments
possible support for additional MCP capabilities such as resources or prompts, where they fit the project's deterministic scope
This allows AI systems and agents to normalize Japanese text using Utsuho while keeping the interface explicit and predictable.
Kana → Romaji Conversion
Add deterministic kana-to-romaji conversion.
Goals:
deterministic mapping
no dictionary or morphological analysis
suitable for indexing and search preprocessing
Expected features:
Hepburn style output
handling of:
small kana
long vowels
sokuon (っ)
yōon (きゃ, しゃ, etc.)
Mid Term
Iteration Mark Expansion
Add support for expanding Japanese iteration marks.
Examples:
時々 → 時時
いろゝ → いろろ
サヽキ → ササキ
Supported characters:
ゝ ゞ ヽ ヾ
This feature remains deterministic and does not require dictionaries.
Performance Improvements
Continue improving performance while keeping the implementation pure Python.
Possible improvements include:
optimized string building
reduced allocations
faster lookup paths
Benchmarks will be maintained to detect performance regressions.
Long Term
API Stabilization
Review and refine the public API as the project grows.
Possible improvements include:
improved configuration design
clearer separation between normalization types
better extensibility for future converters
Breaking API changes, if necessary, would be introduced in a future major release.
Non-Goals
To keep the project focused and maintainable, Utsuho intentionally avoids:
dictionary-based processing
morphological analysis
heavy linguistic frameworks
native extensions
The project will remain a pure-Python deterministic normalization library.