Using re.findall to extract data from a line

Question

I am trying (and failing so far) to extract time and two measurement data from a text line (read from a file)

The lines have following format

"2013-08-07-21-25   26.0   1015.81"

I tried (among other):

>>> re.findall(r"([0-9,-]+)|(\d+.\d+)", "2013-08-07-21-25   26.0   1015.81")
[('2013-08-07-21-25', ''), ('26', ''), ('0', ''), ('1015', ''), ('81', '')]

And only got entertaining (but not desired) results.

I would like to find a solution like this:

date, temp, press = re.findall(r"The_right_stuff", "2013-08-07-21-25   26.0   1015.81")
print date + '
' + temp + '
' + press + '
'
2013-08-07-21-25
26.0
1015.81

Even better if the assignment could be stuck into a test to check if the number of matches is correct.

if len(date, temp, press = re.findall(r"The_rigth_stuff", "2013-08-07-21-25   26.0   1015.81")) == 3:
    print 'Got good data.'
    print date + '
' + temp + '
' + press + '
'

The lines have be transmitted via serial connection and might have bad (i.e. unexpected) characters interspersed. So it does not work to separate by string index.

See Prevent datetime.strptime from exit in case of format mismatch.

Edit @hjpotter92

I mentioned there were corrupted lines from the serial transmission. The below example failed the split solution.

2013-08-1q-07-15   23.8   1014.92
2013-08-11-07-20   23.8   101$96
6113-p8-11-0-25   23.8   1015*04

Assigning the list of measurements into a numpy array failed.

>>> p_arr= np.asfarray(p_list, dtype='float')
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib/python2.7/dist-packages/numpy/lib/type_check.py", line 105, in asfarray
    return asarray(a, dtype=dtype)
  File "/usr/lib/python2.7/dist-packages/numpy/core/numeric.py", line 460, in asarray
        return array(a, dtype, copy=False, order=order)
    ValueError: invalid literal for float(): 101$96

I put the set of data here.

hjpotter92 · Accepted Answer

Use a re.split since the data is separated by horizontal-space characters:

date, temp, press = re.split('\s+', "2013-08-07-21-25   26.0   1015.81")

>>> import re
>>> date, temp, press = re.split('\s+', "2013-08-07-21-25   26.0   1015.81")
>>> print date
2013-08-07-21-25
>>> print temp
26.0
>>> print press
1015.81

Using re.findall to extract data from a line

Answers (2)

Related Questions