marsx
marsx

Reputation: 715

Regular expression in loops

I have two lists with data that I want to compare dates for. I tried using regular expression within a loop to find the corresponding entry from L1 in L2. The entries in the lists consist of strings 'code, name, date', and I want to match the entry from L1 with the entry that begins with the same code in L2. I wrote the regular expression like this:

for line in L2:
    if re.match((code), line):

where 'code' is the part of the string in L1 that I want to match. It works for most of the entries except for one hitch. Say one code is 'ABC' and another is 'ABCD', when it searches for 'ABC', it also matches up to 'ABCD' so I get two entries for 'ABC', one with the correct information, and one with the information for 'ABCD', and then yet another entry for 'ABCD'. Is there a way to make sure that the regular expression matches/searches only the exact code? The code is always at the beginning of every entry, if that makes a difference.

Upvotes: 0

Views: 542

Answers (2)

Steve Wortham
Steve Wortham

Reputation: 22220

You can perform an exact match in a regular expression like this ^ABC\b. Where ^ is a begin anchor and \b is a word boundary. I think that'll get you what you want. In the case of ABC, the word boundary is immediately before the comma.

Upvotes: 1

You can use an anchor to achieve this:

re.match('^' + code + r'(?=\s*,)', line)

^ anchors the match to the beginning of the line, and (?=\s*,) means that the match must be followed by any amount of whitespace, followed by a comma.

This way you ensure that the code matches the code in its entirety, and also that it doesn't match any of the other fields.

See also:

Upvotes: 1

Related Questions