Mod
Mod

Reputation: 65

Apart from returning string and iterator in re.findall() and re.finditer() in python do their working also differ?

Wrote the following code so that i get all variable length patterns matching str_key.

line = "ABCDABCDABCDXXXABCDXXABCDABCDABCD"
str_key = "ABCD"
regex = rf"({str_key})+"

find_all_found = re.findall(regex,line)
print(find_all_found)

find_iter_found = re.finditer(regex, line)
for i in find_iter_found:
    print(i.group())

Output i got:

['ABCD', 'ABCD', 'ABCD']
ABCDABCDABCD
ABCD
ABCDABCDABCD

The intended output is last three lines printed by finditer(). I was expecting both functions to give me same output(list or callable does not matter). why it differs in findall() as far i understood from other posts already on stackoverflow, these two functions differ only in their return types and not in matching patterns. Do they work differently, if not what have i done wrong?

Upvotes: 2

Views: 641

Answers (2)

Toto
Toto

Reputation: 91438

For re.findall change your regex

  • regex = rf"({str_key})+"

into

  • regex = rf"((?:{str_key})+)".

The quantifier + have to inside the capture group.

Upvotes: 1

rudolfovic
rudolfovic

Reputation: 3276

You want to access groups rather than group.

>>> find_iter_found = re.finditer(regex, line)
>>> for i in find_iter_found:
...     print(i.groups()[0])

The difference between the two methods is explained here.

The behaviour of the two functions is pretty much the same as far as the matching process is concerned as per:

re.findall(pattern, string, flags=0)

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result.

Changed in version 3.7: Non-empty matches can now start just after a previous empty match.

re.finditer(pattern, string, flags=0)

Return an iterator yielding match objects over all non-overlapping matches for the RE pattern in string. The string is scanned left-to-right, and matches are returned in the order found. Empty matches are included in the result.

Changed in version 3.7: Non-empty matches can now start just after a previous empty match.

Upvotes: 2

Related Questions