Reputation: 19
my_string = "C2H6O"
a = re.findall("((Cl|H|O|C|N)[0-9]*)", my_string)
print(a)
The output is [("C2", "C"), ("H6", "H"), ("O", "O")]
, but I expected ["C2", "H6", "O"]
.
I somewhat understand the tuple, but I feel like nothing in this code causes the second element in the tuple ("C2", "C")
.
Upvotes: 0
Views: 1094
Reputation: 38937
You can change your regular expression to:
re.findall("([Cl|H|O|C|N][0-9]*)", my_string)
You will get what you expect. This removes some of the grouping.
Upvotes: -1
Reputation: 40878
Because your pattern contains capture groups.
From re.findall()
:
If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.
If you want to get rid of them, use this pattern:
r"(?:Cl|H|O|C|N)[0-9]*"
It removes the (unneeded) outer capture group completely and uses a non-capturing group for the alpha characters.
>>> re.findall(r"(?:Cl|H|O|C|N)[0-9]*", my_string)
['C2', 'H6', 'O']
Upvotes: 5