Tullio_IRL
Tullio_IRL

Reputation: 99

python regex find/match one or more in a string

I almost can't see anymore for searching google and this site for solutions to my problem.

I want to pick out one or more sequences of two different strings of text from a string:

e.g. 'aSATMPA23.37aSAAWAKE----aSABATT2.05-aSASLEEPING-'

So I'd like to be able to pick out the 'aSATMPA23.37' and if it's there also the 'aSABATT2.05'.

I've tried the following:

import re
serialdata = 'aSATMPA18.5-----aSBBATT2.97-aSBSLEEPING-'
def regex_serialdata(data):                                   
    GrandRegex = re.compile(r'(aS(.)(TMPA)(\d+\.\d+))|(aS(.)(BATT)(\d+\.\d+))')
    match = GrandRegex.match(data)

but this stops after only the first match of 'aSATMPA18.5'

Next I tried using 'findall' method:

def regex_serialdata(data):                                   
    GrandRegex = re.compile(r'(aS(.)(TMPA)(\d+\.\d+))|(aS(.)(BATT)(\d+\.\d+))')      
    match = GrandRegex.findall(data)
    print(match)

Which resulted in: [('aSATMPA18.5', 'A', 'TMPA', '18.5', '', '', '', ''), ('', '', '', '', 'aSBBATT2.97', 'B', 'BATT', '2.97')]

Is there a better way to do this?

Can I access the values within the list of tuples easily?

Please note, I have spent hours on this and don't ask for help lightly.

Much appreciated,

Paul

Upvotes: 2

Views: 126

Answers (4)

Tullio_IRL
Tullio_IRL

Reputation: 99

Thanks to everyone who replied and contributed, with your help I've come up with the following:

import re

serialdata = 'aSATMPA18.5-----aSBBATT2.97-aSBSLEEPING-'

def regex_serialdata(data):                                  
    GrandRegex = re.compile(r'aS(.)(TMPA|BATT)(\d+.\d+)')

    match = GrandRegex.findall(data)

    print(match)
for x, y, z in match:   
    if y == 'TMPA':
        print('Temp is %s' % z)
    elif y == 'BATT':
        print('Battery is %sv' % z)

This produced the following output which is exactly what I want:

[('A', 'TMPA', '18.5'), ('B', 'BATT', '2.97'), ('B', 'TMPA', '24.18')]
Temp is 18.5
Battery is 2.97v

I'm delighted, it even looks pretty :)

Many thanks,

Paul

Upvotes: 0

Saleem
Saleem

Reputation: 8978

Try following regex:

r'(aSA(?:TMPA|BATT))(\d+(?:\.\d+)?)'

Full Code:

import re
p = re.compile(r'(aSA(?:TMPA|BATT))(\d+(?:\.\d+)?)', re.DOTALL)

test_str = """
aSATMPA23.37aSAAWAKE----aSABATT2.05-aSASLEEPING-aSATMPA23.37aSAAWAKE--
--aSABATT2.05-aSASLEEPING-aSATMPA23.37aSAAWAKE---
-aSABATT2.05-aSASLEEPING-aSATMPA23.37aSAAWAKE-
"""

for m in re.finditer(p, test_str):
    print('{0:<15}{1}'.format(m.group(1), m.group(2)))

It will print:

aSATMPA        23.37
aSABATT        2.05
aSATMPA        23.37
aSABATT        2.05
aSATMPA        23.37
aSABATT        2.05
aSATMPA        23.37

See demo

Based on your input, it will capture

  • aSATMPA23.37
  • aSABATT2.05

Upvotes: 1

Tony Babarino
Tony Babarino

Reputation: 3405

>>> a = 'aSATMPA23.37aSAAWAKE----aSATMPA15.14-aSASLEEPING-'
>>> re.findall(r'aSATMPA\d+.\d+',a)
['aSATMPA23.37', 'aSATMPA15.14']

If You place the parenthesis like below, You can get a list of tuples with the values that You want from every match:

>>> a
'aSATMPA23.37aSAAWAKE----aSBBATT2.05-aSASLEEPING-'
>>> b = re.findall(r'(aS)(ATMPA|BBATT)(\d+.\d+)',a)
>>> b
[('aS', 'ATMPA', '23.37'), ('aS', 'BBATT', '2.05')]
>>> b[0][0]
'aS'
>>> b[0][1]
'ATMPA'
>>> b[0][2]
'23.37'
>>> b[1][0]
'aS'
>>> b[1][1]
'BBATT'
>>> b[1][2]
'2.05'

Upvotes: 3

Robᵩ
Robᵩ

Reputation: 168596

Is there a better way to do this?

Yes. Get rid of all of your parentheses:

import re
serialdata = 'aSATMPA18.5-----aSBBATT2.97-aSBSLEEPING-'
def regex_serialdata(data):
    GrandRegex = re.compile(r'aS.TMPA\d+\.\d+|aS.BATT\d+\.\d+')
    match = GrandRegex.findall(data)
    print (match)

regex_serialdata(serialdata)

Can I access the values within the list of tuples easily?

Yes. From your second example, try print(match[0][0], match[1][4]).

Upvotes: 2

Related Questions