Raitis Kupce
Raitis Kupce

Reputation: 830

Python look for a format that match specific format

I have a list of data in CSV file.

I am using loop to make them into fields... I need a code that look for a field in code that do not match this type of code "[A9]9AA9#9" Lets say "A" can be any letter and "9" can be any number. However [ ] and # symbol must be in the same location as in format.

def code():
    match= 0
    tree_file.readline() # skip first row
    for row in tree_file: 
        field=row.strip()
        field=field.split(",") # make Into fields
        code=(field[4])        
        if code != "[X9]9XX9#9":   #HERE SHOULD BE THE CODE
            match+=1        

Please can you leave some comment in code so I can understand them because I can't understand how other solution that are available is relevant to my problem.

Upvotes: 0

Views: 1917

Answers (3)

ATOzTOA
ATOzTOA

Reputation: 35960

The regular expression you need is:

r'\[[A-Za-z][0-9]+\][0-9]+[A-Za-z]{2}[0-9]+#[0-9]+'

So the code can be like

import re

if re.search(r'\[[A-Za-z][0-9]+\][0-9]+[A-Za-z]{2}[0-9]+#[0-9]+', code) is None:
    match += 1  

Explanation

[A-Za-z] : matches any alphabet
[0-9]+ : matches one or more digits
[A-Za-z]{2} : matches two alphabets

Output

>>> import re
>>> s = "[X9]9XX9#9"
>>> re.search(r'\[[A-Za-z][0-9]+\][0-9]+[A-Za-z]{2}[0-9]+#[0-9]+', s) is None
False
>>> s = "ABCD"
>>> re.search(r'\[[A-Za-z][0-9]+\][0-9]+[A-Za-z]{2}[0-9]+#[0-9]+', s) is None
True
>>> s = "[A123]456BB8#789"
>>> re.search(r'\[[A-Za-z][0-9]+\][0-9]+[A-Za-z]{2}[0-9]+#[0-9]+', s) is None
False
>>> 

Upvotes: 1

RocketDonkey
RocketDonkey

Reputation: 37269

You could try using the following regular expression. This will accept lowercase and uppercase letters ([a-zA-Z]) and digits in their respective places (\d). We first compile the pattern, which is the regular expression we are trying to match (see here for a much more detailed regex explanation). You then use re.match to attempt to 'match' the input string to the pattern. If the pattern matches, the group() method will return the matched group. If it doesn't, the re.match() will return None (which you can handle better than I did below :) ):

In [11]: import re

In [12]: pattern = re.compile(r'\[[a-zA-z]\d\]\d[a-zA-Z]{2}\d#\d')

In [13]: re.match(pattern, '[X9]9XX9#9').group()
Out[13]: '[X9]9XX9#9'

In [14]: re.match(pattern, '[Z7]3JK2#1').group()
Out[14]: '[Z7]3JK2#1'

In [15]: re.match(pattern, '[ZZ]3JK2#1').group()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-15-48efdbbda230> in <module>()
----> 1 re.match(pattern, '[ZZ]3JK2#1').group()

AttributeError: 'NoneType' object has no attribute 'group'

One way to handle the non-matching case is to assign the result to a variable and then process based on whether it returns anything or not:

In [16]: match = re.match(pattern, '[ZZ]3JK2#1')

In [17]: if match:
   ...:     print match.group()
   ...:     

In [5]: 

Upvotes: 2

hd1
hd1

Reputation: 34677

reg = re.compile(r'\[[A-Z][0-9]\][0-9][A-Z]{2}[0-9]#[0-9]') works for me...

The - operator defines a range and the [] must be escaped, as by definition, the [] operator matches a set of characters. You could also use Unicode character classes to do this if you need a locale-independent solution.

Upvotes: 1

Related Questions