Reputation: 4318
I have an array with headers, and I obtain the headers with the following command:
>>> headers=list(data.dtype.names)
>>> headers
['SqNr', 'Xpos', 'Ypos', 'ALPHA_J2000', 'DELTA_J2000', \
'UMAG', 'BMAG', 'VMAG', 'RMAG', 'IMAG', 'MB420MAG', \
'MB464MAG', 'MB485MAG', 'MB518MAG', 'MB571MAG', 'MB604MAG',\
'MB646MAG', 'MB696MAG', 'MB753MAG', 'MB815MAG', 'MB855MAG',\
'MB915MAG', 'UMAG_ERR', 'BMAG_ERR', 'VMAG_ERR', 'RMAG_ERR',\
'IMAG_ERR', 'MB420MAG_ERR', 'MB464MAG_ERR', 'MB485MAG_ERR',\
'MB518MAG_ERR', 'MB571MAG_ERR', 'MB604MAG_ERR', 'MB646MAG_ERR',\
'MB696MAG_ERR', 'MB753MAG_ERR', 'MB815MAG_ERR', 'MB855MAG_ERR',\
'MB915MAG_ERR', 'PHOTOZ', 'PHOTOZ_ERR', 'PHOTOZ2', 'PHOTOZ2_ERR',\
'Z_B', 'Z_B_MIN', 'Z_B_MAX', 'T_B', 'ODDS', 'CHISQUARED', 'Z_M',\
'Z_fp', 'Z_sp', 'Z_s']
I want to make a list comprises all the strings with MAG
pattern at the end and another one contains MAG_ERR
. how could I do that?
I was thinking using the following lines to get the right results:
import re
pattern='MAG'
re.match(r'(%s)+$' % pattern, "".join(headers))
but it doesn't return anything. How could I achieve to the right answers which are:
a=['UMAG', 'BMAG', 'VMAG', 'RMAG', 'IMAG', 'MB420MAG',\
'MB464MAG', 'MB485MAG', 'MB518MAG', 'MB571MAG', 'MB604MAG',\
'MB646MAG', 'MB696MAG', 'MB753MAG', 'MB815MAG', 'MB855MAG','MB915MAG']
Upvotes: 2
Views: 83
Reputation: 176810
You could use str.endswith()
to get the headers which end with the required strings:
a = [x for x in headers if x.endswith(("MAG", "MAG_ERROR"))]
Upvotes: 1
Reputation: 9633
Regex solution:
import re
# create list of matches for "MAG"
MAG_matches = [line for line in headers if re.search(r'MAG(?!_ERR)\Z', line)]
# create list of matches for "MAG_ERR'
MAG_ERR_matches = [line for line in headers if re.search(r'MAG_ERR\Z', line)]
Simpler Solution with String Methods:
# create list of matches for "MAG"
MAG_matches = [line for line in headers if line.endswith('MAG')]
# create list of matches for "MAG_ERR'
MAG_ERR_matches = [line for line in headers if line.endswith('MAG_ERR')]
Upvotes: 1
Reputation: 3382
If I'm understanding you, you want to construct a pattern by selecting the array items ending in MAG then make the resulting list of strings into a single pattern, with each item as an alternate.
mags = [ '.*%s$' % x for x in headers if x.endswith('MAG') ]
is a list comprehension that builds the pattern for each item; you then need to alternate them and build the regex:
mag_alternatives = re.compile( '|'.join(mags) )
You can now use it:
result = mag_alternatives.match(your_string)
if result is not None:
# Do something with the match in result here
I chose this approach because if you want to select a different set of alternatives out of the header, you just need a different function call returning TRUE or FALSE in the comprehension. The rest of the construction of the regex remains the same.
Upvotes: 0
Reputation: 61289
Try this:
a=['SqNr', 'Xpos', 'Ypos', 'ALPHA_J2000', 'DELTA_J2000', \
'UMAG', 'BMAG', 'VMAG', 'RMAG', 'IMAG', 'MB420MAG', \
'MB464MAG', 'MB485MAG', 'MB518MAG', 'MB571MAG', 'MB604MAG',\
'MB646MAG', 'MB696MAG', 'MB753MAG', 'MB815MAG', 'MB855MAG',\
'MB915MAG', 'UMAG_ERR', 'BMAG_ERR', 'VMAG_ERR', 'RMAG_ERR',\
'IMAG_ERR', 'MB420MAG_ERR', 'MB464MAG_ERR', 'MB485MAG_ERR',\
'MB518MAG_ERR', 'MB571MAG_ERR', 'MB604MAG_ERR', 'MB646MAG_ERR',\
'MB696MAG_ERR', 'MB753MAG_ERR', 'MB815MAG_ERR', 'MB855MAG_ERR',\
'MB915MAG_ERR', 'PHOTOZ', 'PHOTOZ_ERR', 'PHOTOZ2', 'PHOTOZ2_ERR',\
'Z_B', 'Z_B_MIN', 'Z_B_MAX', 'T_B', 'ODDS', 'CHISQUARED', 'Z_M',\
'Z_fp', 'Z_sp', 'Z_s']
mags = filter(lambda x: x[-3:]=='MAG', a)
mag_errs = filter(lambda x: x[-7:]=='MAG_ERR', a)
The x[-3:]
pulls out the last three characters of each string and x[-7:]
pulls out the last seven characters. If these match MAG
or MAG_ERR
, respectively, the lambda returns true and filter puts the corresponding string in the output list.
You could also use:
mags = filter(lambda x: x.endswith('MAG'), a)
mag_errs = filter(lambda x: x.endswith('MAG_ERR'), a)
If you want to use regular expressions, you could use a list comprehension:
mags = [x for x in a if re.match(r'.*MAG$', x)]
mag_errs = [x for x in a if re.match(r'.*MAG_ERR$', x)]
The MAG$
matches a MAG
at the end of the line (that's what the $
means) and the .*
matches anything before a MAG
or MAG_ERR
.
Your solution will not work because you merge all of the headers into a single string, making it difficult to separate them later. Using filter
or a list comprehension allows you to implicitly loop through the array pulling out those items which are of interest to you.
Upvotes: 4