[Python/正規表現のバックアップ差分(No.2)]

印刷

--- 1: 2016-12-02 (金) 14:15:29 njf^[6] ^[7] ^[8]
+++ 2: 2016-12-02 (金) 16:02:27 njf^[6] ^[9] ^[10]
@@ Line 1: / Line 1: @@
-*はじめに [#d25f2b03]
+*Pythonで正規表現を使う時に全般的な注意事項 [#d25f2b03]
 Pythonの正規表現は微妙に他の言語と異なるところがあり、特に2.7系では文字コードの扱いで注意が必要です。
+まずすべてstr型なら普通に実行されます。
+ import re
+ testStr = "あいうえお"
+ p = re.compile("あ")
+ print p.sub("か",testStr)
+結果
+ かいうえお
+次に全てUTF8でもうまくいきます。
+ # -*- coding: utf-8 -*-
+ import re
+ testStr = u"あいうえお"
+ p = re.compile(u"あ")
+ print p.sub(u"か",testStr)
+結果
+ かいうえお
+UTF8とstr型を一つにまとめようとするとエラーになります。
+ # -*- coding: utf-8 -*-
+ import re
+ testStr = u"あいうえお"
+ p = re.compile(u"あ")
+ print p.sub("か",testStr)
+結果
+ Traceback (most recent call last):
+   File "regExp.py", line 9, in <module>
+     print p.sub("か",testStr)
+ UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 0: ordinal not in range(128)
+UTF8型からstr型を検索したり、またはその逆をすると検索されません。
+ import re
+ testStr = "あいうえお" #str
+ p = re.compile(u"あ")  #UTF8
+ print p.sub("か",testStr)
+結果
+ あいうえお
+これはエラーにはならないので、結構見つけにくいバグになることがあります。
+かならず同じ型でそろえるようにしましょう。
+日本語があるならUTF8にそろえるのがおすすめです。
+また正規表現検索のメソッドは基本的にre.compleでオブジェクトを作るのと、reからメソッドを呼び出す方法と二つのやり方で実行できます。
+ import re
+ testStr = u"あいうえお"
+ p = re.compile(u"あ")
+ print p.sub(u"か",testStr)
+ print re.sub(u"あ", u"か", testStr)
+結果
+ かいうえお
+ かいうえお
+検索パターンを何度も使い回すなら、コンパイルした方がよいでしょう。
+パターンも動的に変えるなら、reから呼び出すと便利です。
+以下では正規表現の基本的な書式は既知のものとし、pythonでの使い方を中心に解説します。
+*置換 [#l1b0b7a6]
+置換はすでに上の説明で使った、subとsubnがあります。
+subは単純な置換、subnは置換した数とのタプルを返します。
+ # -*- coding: utf-8 -*-
+ import re
+ testStr = u"あいあいうえお"
+ p = re.compile(u"あ")
+ print p.sub(u"か",testStr)
+ for n in p.subn( u"か", testStr):
+     print n
+結果
+ かいかいうえお
+ かいかいうえお
+()でのグループ化、それを参照するのにも対応。
+ # -*- coding: utf-8 -*-
+ import re
+ testStr = u"あいあいうえお"
+ p = re.compile(u"(.)あ")
+ print p.sub(u'\\1か',testStr)
+結果
+ あいかいうえお
+上の例では「あ」の前に文字のあるときだけ、その「あ」を「か」に変更しています。ここでもし「\\1」がないと、2文字が「か」になるため、結果は「あかいうえお」となり、「い」が消えてしまいます。グループが複数あれば「\\2」「\\3」などとして参照できます。
+*検索 [#u50983a9]
+pythonの正規表現検索は以下の物があります
+|メソッド|特長|
+|match|文字列先頭のみにマッチする。なければNoneを返す。|
+|search|マッチした場所を返す。なければNoneを返す。|
+|findall|マッチした部分をリストにして返す。|
+|finditer|マッチした部分のMatchオブジェクトのイテレーターを返す。|
+matchとsearchは特に文字列の存在チェックなどによく使います。
+ # -*- coding: utf-8 -*-
+ import re
+ testStr = u"あいうえお"
+ checkStr = u"うえお"
+ if re.search(checkStr, testStr):
+     print "searched"
+ if re.match(checkStr, testStr):
+     print "matched"
+結果
+ searched
+この例では正規表現がないので、「in」を使ってもよいです。
+「match」は先頭のみにヒットするので、「search」に比べると使用頻度は少なめです。
+ヒットした文字列が必要なら「findall」を使います。
+より詳しい情報が必要なら開始、終了位置などが入ったMatchオブジェクトを返す「finditer」を使います。
+ # -*- coding: utf-8 -*-
+ import re
+ testStr = u"にわにはにわにわとりがいる"
+ checkStr = u"にわ"
+ resultList = re.findall(checkStr,testStr)
+ for r in resultList:
+     print r
+ print
+ for r in re.finditer(checkStr,testStr):
+     print r.group() #検索文字列
+     print r.start(),r.end() #開始位置、終了位置
+     print r.span() #開始、終了位置のタプル
+結果
+ にわ
+ にわ
+ にわ
+ にわ
+2
+ (0, 2)
+ にわ
+6
+ (4, 6)
+ にわ
+8
+ (6, 8)
+*まとめ [#c46f1aa9]
+検索、置換ともにもう少し機能がありますが、よく使うのはこのあたりです。
+特に文字コードの扱い、searchとmatchの違いなどではまりやすいので注意が必要です。
+正規表現の文法は他の言語などで使われるものと同様です。

« Prev^[4] Next »^[5]

Python/正規表現のバックアップ一覧^[11]
Python/正規表現のバックアップ差分(No. All)
- 1: 2016-12-02 (金) 14:15:29^[12] njf^[6]
- 2: 2016-12-02 (金) 16:02:27 njf^[6]
- 現: 2017-08-11 (金) 23:32:46^[13] njf^[6]

Links list

(This host) = https://njf.jp

^{(This host)}/cms/modules/xpwiki/index.php?cmd=backup&pgid=138&age=2&action=nowdiff
^{(This host)}/cms/modules/xpwiki/index.php?cmd=backup&pgid=138&age=2&action=source
^{(This host)}/cms/modules/xpwiki/?Python%2F%E6%AD%A3%E8%A6%8F%E8%A1%A8%E7%8F%BE
^{(This host)}/cms/modules/xpwiki/index.php?cmd=backup&pgid=138&action=diff&age=1
^{(This host)}/cms/modules/xpwiki/index.php?cmd=backup&pgid=138&action=diff&age=Cur
^{(This host)}/cms/userinfo.php?uid=1
^{(This host)}/cms/modules/xpwiki/?cmd=backup&pgid=138&action=source&age=1
^{(This host)}/cms/modules/xpwiki/?cmd=edit&pgid=138&backup=1
^{(This host)}/cms/modules/xpwiki/?cmd=backup&pgid=138&action=source&age=2
^{(This host)}/cms/modules/xpwiki/?cmd=edit&pgid=138&backup=2
^{(This host)}/cms/modules/xpwiki/index.php?cmd=backup&pgid=138
^{(This host)}/cms/modules/xpwiki/index.php?cmd=backup&action=diff&pgid=138&age=1
^{(This host)}/cms/modules/xpwiki/index.php?cmd=backup&action=diff&pgid=138&age=Cur

Python​/正規表現 のバックアップ差分(No.2) :: NJF Wiki

Links list

Python/正規表現のバックアップ差分(No.2) :: NJF Wiki