Reputation: 279
Input text
str_ = '''abc xyz pq m_www.google.in_10 -name itel.google.in
abc xyz pq I_www.google.in_9 -name itel.google.com
abc xyz pq I_www.google.in_8
abc xyz pq I.www_google.com_10 -name itel_google.com_9'''
Need to extract the combination coming after 'abc xyz pq ' till next space. This combo can contain \w & dot. Also want to extract combination coming after '-name '. These 2 combination should be a list
Expected output (as a list)
'[['m_www.google.in_10', 'itel.google.in']
['I_www.google.in_9', 'itel.google.com']
['I_www.google.in_8', '']
['I_www.google.com_10', 'itel.google.com_9']]'
My Pseudo Code
import re
re.findall(r'abc xyz pq (\w+)\.(\w+)\.(\w+) -name? (\w+?)\.(\w+?)\.(\w+?)',str_ )
Upvotes: 4
Views: 78
Reputation: 92854
With specific regex pattern:
import re
s = '''abc xyz pq m_www.google.in_10 -name itel.google.in
abc xyz pq I_www.google.in_9 -name itel.google.com
abc xyz pq I_www.google.in_8
abc xyz pq I.www_google.com_10 -name itel_google.com_9'''
res = list(map(list, re.findall(r'\babc xyz pq (\w+[.\w]+)(?: -name (\w+[.\w]+))?', s)))
pprint(res)
The expected output (list of lists):
[['m_www.google.in_10', 'itel.google.in'],
['I_www.google.in_9', 'itel.google.com'],
['I_www.google.in_8', ''],
['I.www_google.com_10', 'itel_google.com_9']]
Regex pattern details:
\b
- word boundary
(\w+[.\w]+)
- capture word character(s) \w+
followed by either .
char or word character sequence [.\w]+
(?: ...)
- marks group as non-capturing, though in the above case it contains another captured group (inner group)(...)?
- marks group as optional (?
quantifier matches between zero and one times)Upvotes: 3
Reputation: 785376
You may use this regex in re.findall
:
>>> for i in re.findall(r'abc xyz pq\s+([\w.]+)(?:\s+-name\s+([\w.]+))?', str_):
... print (i)
...
('m_www.google.in_10', 'itel.google.in')
('I_www.google.in_9', 'itel.google.com')
('I_www.google.in_8', '')
('I.www_google.com_10', 'itel_google.com_9')
Note that the list doesn't match your expected data structure but you can iterate this list and create your custom structure.
Alternatively you may use re.finditer
and prepare your custom list.
Upvotes: 3