[Python/Unicodeの取り扱いのバックアップソース(No.2)]

印刷

*PythonのUnicodeに関する問題 [#xb7d4a12]

PythonでUnicodeをあつかうのはかなりやっかいです。
バージョン3では改善されましたが、バージョン2系ではUnicodeとstr型の二つに互換性がなく、つねにその文字列がstr型なのか、Unicodeなのか意識していないとエラーが出ます。
コンソール画面にテキストを表示する、というようなごく簡単な処理ですらエラーが出ます。

バージョン3以降を使えばよいのですが、ライブラリの対応状況などもありバージョン2系を使うことも多いのが実情です。

そんなときのUnicodeの取り扱いについて紹介します。

*Unicodeを実際に使うには [#e84d1365]

まず、ソースコードにUnicode文字列を書くためには、以下のコメントをソースファイルの先頭付近に入れておく必要があります。

 # -*- coding: utf-8 -*-

これにより、Pythonはそのファイルでutf-8の文字コードが使われていると判断します。
ないとエラーになります。

実際にutf-8の文字列をソースに書くときは、

 utfString = u"これはutfです"

というように、先頭に「u」をつけます。

標準出力でUnicodeを出力するなら、以下のように出力前に変換してやる必要があります。
こうしないとエラーとなります。

 import sys

 sys.stdout = codecs.getwriter('utf_8')(sys.stdout)

標準入力や標準エラー出力も同様です。

 sys.stdin = codecs.getreader('utf-8')(sys.stdin)
 sys.stderr = codecs.getwriter('utf-8')(sys.stderr)

Unicodeとstrを変換するには、Unicode->strはdecode、str->Unicodeはencodeメソッドを使います。

 utfString = u"これはUnicodeです"
 strString = "これはstrです"


 encodedUtf = utfString.encode("utf_8") #utfをstr
 decodedStr = strString.decode("utf_8") #strをutf

 print type(encodedUtf)
 print type(decodedStr)

結果
 <type 'str'>
 <type 'unicode'>

逆にするとエラーとなります。

現実には使っているライブラリなどの引数や戻り値が、どれがunicodeなのかstrなのかいちいち覚えていられないので、エラーが出たらencodeかdecodeを行う、といった対応になります。

unicode、strどちらかわからない、またはどちらの可能性もあるとき、必ずunicodeの結果がほしいという場合には、isinstanceを使って以下のような関数を定義すると便利です。

 def convertToUtf(s):
    if isinstance(s,unicode):
        return s
    return s.decode("utf_8")

するといつでもunicode文字列が得られます。strについても同様の関数が定義できるでしょう。

実際にバージョン2系で開発していると、Unicode関連のエラーはかなりイラッとさせられます。

 UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)

このエラーを何度見たことかわからないくらいです。


根本解決として、早めにバージョン3以降に移るのがおすすめです。

« Prev^[5] Next »^[6]

Python/Unicodeの取り扱いのバックアップ一覧^[7]
Python/Unicodeの取り扱いのバックアップソース(No. All)
- 1: 2016-12-03 (土) 15:47:41^[8] njf^[9]
- 2: 2016-12-03 (土) 16:20:07 njf^[9]
- 現: 2016-12-05 (月) 19:36:52^[10] njf^[9]

Links list

(This host) = https://njf.jp

^{(This host)}/cms/modules/xpwiki/index.php?cmd=backup&pgid=141&age=2&action=diff
^{(This host)}/cms/modules/xpwiki/index.php?cmd=backup&pgid=141&age=2&action=nowdiff
^{(This host)}/cms/modules/xpwiki/index.php?cmd=edit&pgid=141&backup=2
^{(This host)}/cms/modules/xpwiki/?Python%2FUnicode%E3%81%AE%E5%8F%96%E3%82%8A%E6%89%B1%E3%81%84
^{(This host)}/cms/modules/xpwiki/index.php?cmd=backup&pgid=141&action=source&age=1
^{(This host)}/cms/modules/xpwiki/index.php?cmd=backup&pgid=141&action=source&age=Cur
^{(This host)}/cms/modules/xpwiki/index.php?cmd=backup&pgid=141
^{(This host)}/cms/modules/xpwiki/index.php?cmd=backup&action=source&pgid=141&age=1
^{(This host)}/cms/userinfo.php?uid=1
^{(This host)}/cms/modules/xpwiki/index.php?cmd=backup&action=source&pgid=141&age=Cur

Python​/Unicodeの取り扱い のバックアップソース(No.2) :: NJF Wiki

Links list

Python/Unicodeの取り扱いのバックアップソース(No.2) :: NJF Wiki