pyramidka
pyramidka

Reputation: 79

Python 2.7. Extracting data from some part of a string using a regex

Let's import a regex.

import re

Assume there's a string containing some data.

data = '''Mike: Jan 25.1, Feb 24.3, Mar 29.0
   Rob: Jan 22.3, Feb 20.0, Mar 22.0
   Nick: Jan 23.4, Feb 22.0, Mar 23.4'''

For example, we want to extract floats for Rob's line only.

name = 'Rob'

I'd make it like this:

def data_extractor(name, data):
    return re.findall(r'\d+\.\d+', re.findall(r'{}.*'.format(name),data)[0])

The output is ['22.3', '20.0', '22.0'].

Is my way pythonic or it should be improved somehow? It does the job, but I'm not certain about appropriateness of such code.

Thanks for your time.

Upvotes: 1

Views: 73

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626738

A non-regex way consists in splitting the lines and trimming them, and then checking which one starts with Rob and then grab all the float values:

import re
data = '''Mike: Jan 25.1, Feb 24.3, Mar 29.0
   Rob: Jan 22.3, Feb 20.0, Mar 22.0
   Nick: Jan 23.4, Feb 22.0, Mar 23.4'''
name = 'Rob'
lines = [line.strip() for line in data.split("\n")]
for l in lines:
    if l.startswith(name):
        print(re.findall(r'\d+\.\d+', l))
# => ['22.3', '20.0', '22.0']

See a Python demo

If you want to use a purely regex way, you may use a PyPi regex module with a \G based regex:

import regex
data = '''Mike: Jan 25.1, Feb 24.3, Mar 29.0
   Rob: Jan 22.3, Feb 20.0, Mar 22.0
   Nick: Jan 23.4, Feb 22.0, Mar 23.4'''
name = 'Rob'
rx = r'(?:\G(?!\A)|{}).*?(\d+\.\d+)'.format(regex.escape(name))
print(regex.findall(rx, data))

See the online Python demo

This pattern matches:

  • (?:\G(?!\A)|{}) - the end of the last successful match or the name contents
  • .*? - any 0+ chars other than line break chars, as few as possible
  • (\d+\.\d+) - Group 1 (just the value findall will return) matching 1+ digits, . and 1+ digits.

The regex.escape(name) will escape chars like (, ) etc. that might appear in the name.

Upvotes: 1

Related Questions