Reputation: 119
I have a multiline file where on of the lines is:
node:milk1-01|name=milk1-01
So I need to parse this file to search for this line with a blueprint like:
node:________|name=________
Tried to implement this in regex and got confused. Used the below snippet within a loop of reading everyline from the file.
x = re.findall('node:'+'\w+[-]*\d*'+'\\|name='+'\w+-\d*', line)
print(x)
Very new to this concept. Am I doing something wrong? All help is appreciated. Thanks.
Upvotes: 1
Views: 414
Reputation: 957
You are close! Regexes can contain plain text too, so there is no need to concatenate the strings the way you do. Furthermore you seem to separate letters and digits in your try, but the blueprint you provide does not make clear if that is actually necessary. Lastly you don't actually capture any part of your match, you only check if it's there.
import re
line = "node:milk1-01|name=milk1-01"
my_regex = re.compile('node:(.+)\|name=(.+)')
matches = re.findall(my_regex, line)
print(matches)
>>> [('milk1-01', 'milk1-01')]
A few things to note:
(...)
: the parentheses are a capturing group. There are two sets, to capture two different parts.
.+
: The .
matches any character; so letters numbers hyphens and other (readable) characters. the +
means to capture one or more of 'them', being the previous character(s) in your regex. but you already got that.
Final pro-tip: Use a service like Regex101 to build and troubleshoot your regexes. You can see what happens live on-screen.
Upvotes: 1
Reputation: 18641
Use
re.findall(r'node:[^|]*\|name=[^|]*', line)
See proof
EXPLANATION
EXPLANATION
--------------------------------------------------------------------------------
node: 'node:'
--------------------------------------------------------------------------------
[^|]* any character except: '|' (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
\| '|'
--------------------------------------------------------------------------------
name= 'name='
--------------------------------------------------------------------------------
[^|]* any character except: '|' (0 or more times
(matching the most amount possible))
Upvotes: 1
Reputation: 26179
Is this perhaps resembling what you're looking for?
>>> import re
>>> line = 'not\nhere\nnode:milk1-01|name=milk1-01\nsomething\n'
>>> re.findall(r'node:.*\|name=.*', line)
['node:milk1-01|name=milk1-01']
Upvotes: 2