Reputation: 79
Let's import a regex.
import re
Assume there's a string containing some data.
data = '''Mike: Jan 25.1, Feb 24.3, Mar 29.0
Rob: Jan 22.3, Feb 20.0, Mar 22.0
Nick: Jan 23.4, Feb 22.0, Mar 23.4'''
For example, we want to extract floats for Rob's line only.
name = 'Rob'
I'd make it like this:
def data_extractor(name, data):
return re.findall(r'\d+\.\d+', re.findall(r'{}.*'.format(name),data)[0])
The output is ['22.3', '20.0', '22.0']
.
Is my way pythonic or it should be improved somehow? It does the job, but I'm not certain about appropriateness of such code.
Thanks for your time.
Upvotes: 1
Views: 73
Reputation: 626738
A non-regex way consists in splitting the lines and trimming them, and then checking which one starts with Rob
and then grab all the float values:
import re
data = '''Mike: Jan 25.1, Feb 24.3, Mar 29.0
Rob: Jan 22.3, Feb 20.0, Mar 22.0
Nick: Jan 23.4, Feb 22.0, Mar 23.4'''
name = 'Rob'
lines = [line.strip() for line in data.split("\n")]
for l in lines:
if l.startswith(name):
print(re.findall(r'\d+\.\d+', l))
# => ['22.3', '20.0', '22.0']
See a Python demo
If you want to use a purely regex way, you may use a PyPi regex
module with a \G
based regex:
import regex
data = '''Mike: Jan 25.1, Feb 24.3, Mar 29.0
Rob: Jan 22.3, Feb 20.0, Mar 22.0
Nick: Jan 23.4, Feb 22.0, Mar 23.4'''
name = 'Rob'
rx = r'(?:\G(?!\A)|{}).*?(\d+\.\d+)'.format(regex.escape(name))
print(regex.findall(rx, data))
See the online Python demo
This pattern matches:
(?:\G(?!\A)|{})
- the end of the last successful match or the name
contents.*?
- any 0+ chars other than line break chars, as few as possible(\d+\.\d+)
- Group 1 (just the value findall
will return) matching 1+ digits, .
and 1+ digits.The regex.escape(name)
will escape chars like (
, )
etc. that might appear in the name
.
Upvotes: 1