I have following input in the log file which I am interested to capture all the part of IDs, however it won't return me the whole of the ID and just returns me some part of that: id:A2uhasan30hamwix١٦٠٢٢٧١٣٣٣١١٣٥٤ id:A2uhasan30hamwix160212145302428 id:A2uhasan30hamwix١٦٠٢٠٩١٣٠١٥٠٠١١ id:A2uhasan30hamwix١٦٠٢٠٩١٦٤٧٣٩٧٣٢ id:A2uhasan30hamwix١٦٠٢٠٨١٩٢٨٠١٩٠٧ id:A2uhasan30hamwix160207145023750 I have used the following regular expression with python 2.7: I have edited sid to id: RE_SID = re.compile(r'sid:(<<")?(?P<sid>([A-Za-z0-9._+]*))', re.U) to >>> RE_SID = re.compile(ur'id:(<<")?(?P<sid>[A-Za-z\d._+]*)', re.U) >>> sid = RE_SID.search('id:A2uhasan30hamwix١٦٠٢٢٧١٣٣٣١١٣٥٤').group('sid') >>> sid 'A2uhasan30hamwix' and this is my result: is: A2uhasan30hamwix After edit: This is how I am reading the log file: with open(cfg.log_file) as input_file: ... fields = line.strip().split(' ') and an example of line in log: 2015-11-30T23:58:13.760950+00:00 calxxx enexxxxce[10476]: INFO consume_essor: user:<<"ailxxxied">> callee_num:<<"+144442567413">> id:<<"A2uhasan30hamwix١٦٠٢٠٨١٩٢٨٠١٩٠٧">> credits:0.0 result:ok provider:sipovvvv1.yv.vs I will appreciated to help me to edit my regular expression.

Reputation: 632

Regular expression doesn't extract whole the id from a log file?

I have following input in the log file which I am interested to capture all the part of IDs, however it won't return me the whole of the ID and just returns me some part of that:

id:A2uhasan30hamwix١٦٠٢٢٧١٣٣٣١١٣٥٤ 
id:A2uhasan30hamwix160212145302428 
id:A2uhasan30hamwix١٦٠٢٠٩١٣٠١٥٠٠١١ 
id:A2uhasan30hamwix١٦٠٢٠٩١٦٤٧٣٩٧٣٢ 
id:A2uhasan30hamwix١٦٠٢٠٨١٩٢٨٠١٩٠٧ 
id:A2uhasan30hamwix160207145023750

I have used the following regular expression with python 2.7:

I have edited sid to id:
RE_SID = re.compile(r'sid:(<<")?(?P<sid>([A-Za-z0-9._+]*))', re.U)

>>> RE_SID = re.compile(ur'id:(<<")?(?P<sid>[A-Za-z\d._+]*)', re.U)
>>> sid = RE_SID.search('id:A2uhasan30hamwix١٦٠٢٢٧١٣٣٣١١٣٥٤').group('sid')
>>> sid
'A2uhasan30hamwix'

and this is my result:

is: A2uhasan30hamwix

After edit: This is how I am reading the log file:

with open(cfg.log_file) as input_file: ...
     fields = line.strip().split(' ')

and an example of line in log:

2015-11-30T23:58:13.760950+00:00 calxxx enexxxxce[10476]: INFO consume_essor: user:<<"ailxxxied">> callee_num:<<"+144442567413">> id:<<"A2uhasan30hamwix١٦٠٢٠٨١٩٢٨٠١٩٠٧">> credits:0.0 result:ok provider:sipovvvv1.yv.vs

I will appreciated to help me to edit my regular expression.

Upvotes: 3

Answers (3)

Wiktor Stribiżew

Reputation: 627101

Based on what we discussed in the chat, posting the solution:

import codecs
import re
RE_SID = re.compile(ur'id:(<<")?(?P<sid>[A-Za-z\d._+]*)', re.U) # \d used to match non-ASCII digits, too
input_file = codecs.open(cfg.log_file, encoding='utf-8')  # Read the file with UTF8 encoding
for line in input_file: 
    fields = line.strip().split(u' ') # u prefix is important!
    if len(fields) >= 11: 
    try: 
        # ...... 
        sid = RE_SID.search(fields[7]).group('sid') # Or check if there is a match first

Upvotes: 1

Ash Ishh

Reputation: 588

string = '''
id:A2uhasan30hamwix١٦٠٢٢٧١٣٣٣١١٣٥٤ 
id:A2uhasan30hamwix160212145302428 
id:A2uhasan30hamwix١٦٠٢٠٩١٣٠١٥٠٠١١ 
id:A2uhasan30hamwix١٦٠٢٠٩١٦٤٧٣٩٧٣٢ 
id:A2uhasan30hamwix١٦٠٢٠٨١٩٢٨٠١٩٠٧ 
id:A2uhasan30hamwix160207145023750
'''
import re
reObj = re.compile(r'id:.*')
ans = reObj.findall(string,re.DOTALL)

print(ans)

Output :

['id:A2uhasan30hamwix160212145302428 ', 
 'id:A2uhasan30hamwix١٦٠٢٠٩١٣٠١٥٠٠١١ ', 
 'id:A2uhasan30hamwix١٦٠٢٠٩١٦٤٧٣٩٧٣٢ ', 
 'id:A2uhasan30hamwix١٦٠٢٠٨١٩٢٨٠١٩٠٧ ', 
 'id:A2uhasan30hamwix160207145023750']

Upvotes: 0

alecxe

Reputation: 474031

3 things to fix:

id instead of sid
use \d instead of 0-9 to also catch the arabic numerals
no need to add an extra capturing group inside the sid named group

Fixed version:

id:(<<")?(?P<sid>[A-Za-z\d_.+]+)

Upvotes: 1

Regular expression doesn&#39;t extract whole the id from a log file?

Answers (3)

Related Questions

Regular expression doesn't extract whole the id from a log file?