Dalek
Dalek

Reputation: 4318

Finding a pattern in a list of strings

I have an array with headers, and I obtain the headers with the following command:

>>> headers=list(data.dtype.names)
>>> headers
['SqNr', 'Xpos', 'Ypos', 'ALPHA_J2000', 'DELTA_J2000', \
 'UMAG', 'BMAG', 'VMAG', 'RMAG', 'IMAG', 'MB420MAG', \
 'MB464MAG', 'MB485MAG', 'MB518MAG', 'MB571MAG', 'MB604MAG',\
 'MB646MAG', 'MB696MAG', 'MB753MAG', 'MB815MAG', 'MB855MAG',\
 'MB915MAG', 'UMAG_ERR', 'BMAG_ERR', 'VMAG_ERR', 'RMAG_ERR',\
 'IMAG_ERR', 'MB420MAG_ERR', 'MB464MAG_ERR', 'MB485MAG_ERR',\
 'MB518MAG_ERR', 'MB571MAG_ERR', 'MB604MAG_ERR', 'MB646MAG_ERR',\
 'MB696MAG_ERR', 'MB753MAG_ERR', 'MB815MAG_ERR', 'MB855MAG_ERR',\
 'MB915MAG_ERR', 'PHOTOZ', 'PHOTOZ_ERR', 'PHOTOZ2', 'PHOTOZ2_ERR',\
 'Z_B', 'Z_B_MIN', 'Z_B_MAX', 'T_B', 'ODDS', 'CHISQUARED', 'Z_M',\
 'Z_fp', 'Z_sp', 'Z_s']

I want to make a list comprises all the strings with MAG pattern at the end and another one contains MAG_ERR. how could I do that? I was thinking using the following lines to get the right results:

import re  
pattern='MAG'
re.match(r'(%s)+$' % pattern, "".join(headers))

but it doesn't return anything. How could I achieve to the right answers which are:

a=['UMAG', 'BMAG', 'VMAG', 'RMAG', 'IMAG', 'MB420MAG',\
   'MB464MAG', 'MB485MAG', 'MB518MAG', 'MB571MAG', 'MB604MAG',\
   'MB646MAG', 'MB696MAG', 'MB753MAG', 'MB815MAG', 'MB855MAG','MB915MAG'] 

Upvotes: 2

Views: 83

Answers (4)

Alex Riley
Alex Riley

Reputation: 176810

You could use str.endswith() to get the headers which end with the required strings:

a = [x for x in headers if x.endswith(("MAG", "MAG_ERROR"))]

Upvotes: 1

skrrgwasme
skrrgwasme

Reputation: 9633

Regex solution:

import re

# create list of matches for "MAG"
MAG_matches = [line for line in headers if re.search(r'MAG(?!_ERR)\Z', line)]

# create list of matches for "MAG_ERR'    
MAG_ERR_matches = [line for line in headers if re.search(r'MAG_ERR\Z', line)]

Simpler Solution with String Methods:

# create list of matches for "MAG"
MAG_matches = [line for line in headers if line.endswith('MAG')]

# create list of matches for "MAG_ERR'    
MAG_ERR_matches = [line for line in headers if line.endswith('MAG_ERR')]

Upvotes: 1

Joe McMahon
Joe McMahon

Reputation: 3382

If I'm understanding you, you want to construct a pattern by selecting the array items ending in MAG then make the resulting list of strings into a single pattern, with each item as an alternate.

mags = [ '.*%s$' % x for x in headers if x.endswith('MAG') ]

is a list comprehension that builds the pattern for each item; you then need to alternate them and build the regex:

mag_alternatives = re.compile( '|'.join(mags) )

You can now use it:

result = mag_alternatives.match(your_string)
if result is not None:
    # Do something with the match in result here

I chose this approach because if you want to select a different set of alternatives out of the header, you just need a different function call returning TRUE or FALSE in the comprehension. The rest of the construction of the regex remains the same.

Upvotes: 0

Richard
Richard

Reputation: 61289

Try this:

a=['SqNr', 'Xpos', 'Ypos', 'ALPHA_J2000', 'DELTA_J2000', \
 'UMAG', 'BMAG', 'VMAG', 'RMAG', 'IMAG', 'MB420MAG', \
 'MB464MAG', 'MB485MAG', 'MB518MAG', 'MB571MAG', 'MB604MAG',\
 'MB646MAG', 'MB696MAG', 'MB753MAG', 'MB815MAG', 'MB855MAG',\
 'MB915MAG', 'UMAG_ERR', 'BMAG_ERR', 'VMAG_ERR', 'RMAG_ERR',\
 'IMAG_ERR', 'MB420MAG_ERR', 'MB464MAG_ERR', 'MB485MAG_ERR',\
 'MB518MAG_ERR', 'MB571MAG_ERR', 'MB604MAG_ERR', 'MB646MAG_ERR',\
 'MB696MAG_ERR', 'MB753MAG_ERR', 'MB815MAG_ERR', 'MB855MAG_ERR',\
 'MB915MAG_ERR', 'PHOTOZ', 'PHOTOZ_ERR', 'PHOTOZ2', 'PHOTOZ2_ERR',\
 'Z_B', 'Z_B_MIN', 'Z_B_MAX', 'T_B', 'ODDS', 'CHISQUARED', 'Z_M',\
 'Z_fp', 'Z_sp', 'Z_s']

mags     = filter(lambda x: x[-3:]=='MAG', a)
mag_errs = filter(lambda x: x[-7:]=='MAG_ERR', a)

The x[-3:] pulls out the last three characters of each string and x[-7:] pulls out the last seven characters. If these match MAG or MAG_ERR, respectively, the lambda returns true and filter puts the corresponding string in the output list.

You could also use:

mags     = filter(lambda x: x.endswith('MAG'), a)
mag_errs = filter(lambda x: x.endswith('MAG_ERR'), a)

If you want to use regular expressions, you could use a list comprehension:

mags     = [x for x in a if re.match(r'.*MAG$', x)]
mag_errs = [x for x in a if re.match(r'.*MAG_ERR$', x)] 

The MAG$ matches a MAG at the end of the line (that's what the $ means) and the .* matches anything before a MAG or MAG_ERR.

Your solution will not work because you merge all of the headers into a single string, making it difficult to separate them later. Using filter or a list comprehension allows you to implicitly loop through the array pulling out those items which are of interest to you.

Upvotes: 4

Related Questions