Reputation: 19

Why Does This Code Return A Tuple With 2 Elements?

my_string = "C2H6O"
a = re.findall("((Cl|H|O|C|N)[0-9]*)", my_string)
print(a)

The output is [("C2", "C"), ("H6", "H"), ("O", "O")], but I expected ["C2", "H6", "O"].

I somewhat understand the tuple, but I feel like nothing in this code causes the second element in the tuple ("C2", "C").

Upvotes: 0

Answers (2)

Reputation: 38937

You can change your regular expression to:

re.findall("([Cl|H|O|C|N][0-9]*)", my_string)

You will get what you expect. This removes some of the grouping.

Upvotes: -1

Reputation: 40878

Because your pattern contains capture groups.

If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.

If you want to get rid of them, use this pattern:

r"(?:Cl|H|O|C|N)[0-9]*"

It removes the (unneeded) outer capture group completely and uses a non-capturing group for the alpha characters.

>>> re.findall(r"(?:Cl|H|O|C|N)[0-9]*", my_string)
['C2', 'H6', 'O']

Upvotes: 5