Reputation: 830
I have a list of data in CSV file.
I am using loop to make them into fields... I need a code that look for a field in code that do not match this type of code "[A9]9AA9#9" Lets say "A" can be any letter and "9" can be any number. However [ ] and # symbol must be in the same location as in format.
def code():
match= 0
tree_file.readline() # skip first row
for row in tree_file:
field=row.strip()
field=field.split(",") # make Into fields
code=(field[4])
if code != "[X9]9XX9#9": #HERE SHOULD BE THE CODE
match+=1
Please can you leave some comment in code so I can understand them because I can't understand how other solution that are available is relevant to my problem.
Upvotes: 0
Views: 1917
Reputation: 35960
The regular expression you need is:
r'\[[A-Za-z][0-9]+\][0-9]+[A-Za-z]{2}[0-9]+#[0-9]+'
So the code can be like
import re
if re.search(r'\[[A-Za-z][0-9]+\][0-9]+[A-Za-z]{2}[0-9]+#[0-9]+', code) is None:
match += 1
Explanation
[A-Za-z] : matches any alphabet
[0-9]+ : matches one or more digits
[A-Za-z]{2} : matches two alphabets
Output
>>> import re
>>> s = "[X9]9XX9#9"
>>> re.search(r'\[[A-Za-z][0-9]+\][0-9]+[A-Za-z]{2}[0-9]+#[0-9]+', s) is None
False
>>> s = "ABCD"
>>> re.search(r'\[[A-Za-z][0-9]+\][0-9]+[A-Za-z]{2}[0-9]+#[0-9]+', s) is None
True
>>> s = "[A123]456BB8#789"
>>> re.search(r'\[[A-Za-z][0-9]+\][0-9]+[A-Za-z]{2}[0-9]+#[0-9]+', s) is None
False
>>>
Upvotes: 1
Reputation: 37269
You could try using the following regular expression. This will accept lowercase and uppercase letters ([a-zA-Z]
) and digits in their respective places (\d
). We first compile the pattern
, which is the regular expression we are trying to match (see here for a much more detailed regex explanation). You then use re.match
to attempt to 'match' the input string to the pattern. If the pattern matches, the group()
method will return the matched group. If it doesn't, the re.match()
will return None
(which you can handle better than I did below :) ):
In [11]: import re
In [12]: pattern = re.compile(r'\[[a-zA-z]\d\]\d[a-zA-Z]{2}\d#\d')
In [13]: re.match(pattern, '[X9]9XX9#9').group()
Out[13]: '[X9]9XX9#9'
In [14]: re.match(pattern, '[Z7]3JK2#1').group()
Out[14]: '[Z7]3JK2#1'
In [15]: re.match(pattern, '[ZZ]3JK2#1').group()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-15-48efdbbda230> in <module>()
----> 1 re.match(pattern, '[ZZ]3JK2#1').group()
AttributeError: 'NoneType' object has no attribute 'group'
One way to handle the non-matching case is to assign the result to a variable and then process based on whether it returns anything or not:
In [16]: match = re.match(pattern, '[ZZ]3JK2#1')
In [17]: if match:
...: print match.group()
...:
In [5]:
Upvotes: 2
Reputation: 34677
reg = re.compile(r'\[[A-Z][0-9]\][0-9][A-Z]{2}[0-9]#[0-9]')
works for me...
The -
operator defines a range and the [] must be escaped, as by definition, the []
operator matches a set of characters. You could also use Unicode character classes to do this if you need a locale-independent solution.
Upvotes: 1