Regex is not set up for non Roman based languages. Lack of built in classes and lack of spaces denoting word boundaries make for challenges.
However there is at least one implementation of regex that has features for Japanese. Oniguruma regular expression library by K. Kosako. It is used in Textmate.
[一-龠] All kanji code points (possibly!) [ぁ-ヿ] All kana codepoints
from “Japanese for iOS” dictionary to 3 column tab delimited text.
Pass 1 #remove brackets
Match:(([ぁ-ヿ、]+))\n(.+)\n\n Replace: \t$1\t$2\n
Pass 2 #kana only entries
Match: ([ぁ-ヿ、]+)\n(.+)\n\n Replace: $1\t$1\t$2\n
Pass 3 remove extra definitions
Match: (;(.+)\n) Replace:
Using Conditional Technique by
www.rexegg.com/regex-trick-conditional-replacement.html
Match: (?sx)([ァ-ヺ])(?=.*:\1=(.)) Replace:** $2
regex101: build, test, and debug regex
This dictionary table needs to be pasted at the bottom of the file for this method to work.
Dictionary:ァ=ぁ:ア=あ:ィ=ぃ:イ=い:ゥ=ぅ:ウ=う:ェ=ぇ:エ=え:ォ=ぉ:オ=お:カ=か:ガ=が:キ=き:ギ=ぎ:ク=く:グ=ぐ:ケ=け:ゲ=げ:コ=こ:ゴ=ご:サ=さ:ザ=ざ:シ=し:ジ=じ:ス=す:ズ=ず:セ=せ:ゼ=ぜ:ソ=そ:ゾ=ぞ:タ=た:ダ=だ:チ=ち:ヂ=ぢ:ッ=っ:ツ=つ:ヅ=づ:テ=て:デ=で:ト=と:ド=ど:ナ=な:ニ=に:ヌ=ぬ:ネ=ね:ノ=の:ハ=は:バ=ば:パ=ぱ:ヒ=ひ:ビ=び:ピ=ぴ:フ=ふ:ブ=ぶ:プ=ぷ:ヘ=へ:ベ=べ:ペ=ぺ:ホ=ほ:ボ=ぼ:ポ=ぽ:マ=ま:ミ=み:ム=む:メ=め:モ=も:ャ=ゃ:ヤ=や:ュ=ゅ:ユ=ゆ:ョ=ょ:ヨ=よ:ラ=ら:リ=り:ル=る:レ=れ:ロ=ろ:ヮ=ゎ:ワ=わ:ヰ=ゐ:ヱ=ゑ:ヲ=を:ン=ん:ヴ=ゔ:ヵ=ヵ:ヶ=ヶ:ヷ=わ゙:ヸ=ゐ゙:ヹ=ゑ゙:ヺ=ヺ
Match: (?sx)([ぁ-ゔ])(?=.*:\1=(.)) Replace: $2
regex101: build, test, and debug regex
This dictionary table needs to be pasted at the bottom of the file for this method to work.
Dictionary:ぁ=ァ:あ=ア:ぃ=ィ:い=イ:ぅ=ゥ:う=ウ:ぇ=ェ:え=エ:ぉ=ォ:お=オ:か=カ:が=ガ:き=キ:ぎ=ギ:く=ク:ぐ=グ:け=ケ:げ=ゲ:こ=コ:ご=ゴ:さ=サ:ざ=ザ:し=シ:じ=ジ:す=ス:ず=ズ:せ=セ:ぜ=ゼ:そ=ソ:ぞ=ゾ:た=タ:だ=ダ:ち=チ:ぢ=ヂ:っ=ッ:つ=ツ:づ=ヅ:て=テ:で=デ:と=ト:ど=ド:な=ナ:に=ニ:ぬ=ヌ:ね=ネ:の=ノ:は=ハ:ば=バ:ぱ=パ:ひ=ヒ:び=ビ:ぴ=ピ:ふ=フ:ぶ=ブ:ぷ=プ:へ=ヘ:べ=ベ:ぺ=ペ:ほ=ホ:ぼ=ボ:ぽ=ポ:ま=マ:み=ミ:む=ム:め=メ:も=モ:ゃ=ャ:や=ヤ:ゅ=ュ:ゆ=ユ:ょ=ョ:よ=ヨ:ら=ラ:り=リ:る=ル:れ=レ:ろ=ロ:ゎ=ヮ:わ=ワ:ゐ=ヰ:ゑ=ヱ:を=ヲ:ん=ン:ゔ=ヴ:ヵ=ヵ:ヶ=ヶ:わ゙=ヷ:ゐ゙=ヸ:ゑ゙=ヹ:ヺ=ヺ
I have since found
A blog post on Japanese unicode codepoints
and
Python unicode regex groups
comment at Mastodon