Reputation: 1716
I am trying (and failing so far) to extract time and two measurement data from a text line (read from a file)
The lines have following format
"2013-08-07-21-25 26.0 1015.81"
I tried (among other):
>>> re.findall(r"([0-9,-]+)|(\d+.\d+)", "2013-08-07-21-25 26.0 1015.81")
[('2013-08-07-21-25', ''), ('26', ''), ('0', ''), ('1015', ''), ('81', '')]
And only got entertaining (but not desired) results.
I would like to find a solution like this:
date, temp, press = re.findall(r"The_right_stuff", "2013-08-07-21-25 26.0 1015.81")
print date + '\n' + temp + '\n' + press + '\n'
2013-08-07-21-25
26.0
1015.81
Even better if the assignment could be stuck into a test to check if the number of matches is correct.
if len(date, temp, press = re.findall(r"The_rigth_stuff", "2013-08-07-21-25 26.0 1015.81")) == 3:
print 'Got good data.'
print date + '\n' + temp + '\n' + press + '\n'
The lines have be transmitted via serial connection and might have bad (i.e. unexpected) characters interspersed. So it does not work to separate by string index.
See Prevent datetime.strptime from exit in case of format mismatch.
Edit @hjpotter92
I mentioned there were corrupted lines from the serial transmission. The below example failed the split solution.
2013-08-1q-07-15 23.8 1014.92
2013-08-11-07-20 23.8 101$96
6113-p8-11-0-25 23.8 1015*04
Assigning the list of measurements into a numpy array failed.
>>> p_arr= np.asfarray(p_list, dtype='float')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/dist-packages/numpy/lib/type_check.py", line 105, in asfarray
return asarray(a, dtype=dtype)
File "/usr/lib/python2.7/dist-packages/numpy/core/numeric.py", line 460, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: invalid literal for float(): 101$96
I put the set of data here.
Upvotes: 1
Views: 163
Reputation: 67968
print [i+j for i,j in re.findall(r"\b(\d+(?!\.)(?:[,-]\d+)*)\b|\b(\d+\.\d+)\b", "2013-08-07-21-25 26.0 1015.81")]
You have to prevent first group from taking anything away from what is meant from the second group.
Output:['2013-08-07-21-25', '26.0', '1015.81']
Upvotes: 1
Reputation: 80639
Use a re.split
since the data is separated by horizontal-space characters:
date, temp, press = re.split('\s+', "2013-08-07-21-25 26.0 1015.81")
>>> import re
>>> date, temp, press = re.split('\s+', "2013-08-07-21-25 26.0 1015.81")
>>> print date
2013-08-07-21-25
>>> print temp
26.0
>>> print press
1015.81
Upvotes: 2