Reputation: 1160
Using re.findall, I want to extract the values assigned to each PCR.
>>> z
'PCR-09: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r\nPCR-10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r\nPCR-11: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r\nPCR-12: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r\nPCR-13: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r\nPCR-14: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r\nPCR-15: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r\nPCR-16: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r\n
>>> print z
PCR-09: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
PCR-10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
PCR-11: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
PCR-12: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
PCR-13: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
PCR-14: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
PCR-15: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
PCR-16: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Initially, I tried this but can someone point out what is wrong with the regex used?
>>> re.search('PCR-09:(.*?)', z).groups()
('',)
Shouldn't the non-greedy expr (.*?)
match all characters until it finds a newline?
With a slightly modified regex, I get the desired result:
>>> re.search('PCR-09:(.*?)\s\r\n', z).groups()
(' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00',)
On the same lines, this doesn't work:
>>> re.findall(r'(PCR-\d+):(.*?)', z)
[('PCR-09', ''), ('PCR-10', ''), ('PCR-11', ''), ('PCR-12', ''), ('PCR-13', ''), ('PCR-14', ''), ('PCR-15', ''), ('PCR-16', ''),
But this does:
>>> re.findall(r'(PCR-\d+):(.*?)\s\r\n', z,re.DOTALL)
[('PCR-09', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00'), ('PCR-10', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00'), ('PCR-11', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00'), ('PCR-12', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00'), ('PCR-13', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00'), ('PCR-14', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00'), ('PCR-15', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00'), ('PCR-16', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00'),
Hoping someone can explain what's wrong with my approach.
Thanks
Upvotes: 1
Views: 51
Reputation: 19743
try using: split
[ x.split(':') for x in z.split('\r\n')]
output:
[['PCR-09', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '], ['PCR-10', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '], ['PCR-11', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '], ['PCR-12', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '], ['PCR-13', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '], ['PCR-14', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '], ['PCR-15', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '], ['PCR-16', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '], ['']]
using regex
re.findall('(PCR-\d+)(.*)',z)
Upvotes: 0
Reputation: 3468
The reason r'PCR-09:(.*?)'
doesn't match what you expect is that non-greedy regexes stop as soon as they're valid.
So (.*?)
can match ''
, so the regex stops immediately.
In contrast, r'(PCR-\d+):(.*?)\s\r\n'
is non-greedy, but because it needs to find `\s\r\n', it will force the expansion to work.
I would recommend using a greedy regex that includes only the characters you expect to find: r'(PCR-\d+):([0-9 ]*)'
.
Upvotes: 3
Reputation:
The pattern PCR-09:(.*?)
tells Python to match zero or more characters non-greedily after PCR-09:
. So, it does precisely this and matches zero characters.
You need to have your Regex be greedy in order to match everything up to the newline:
>>> re.search('PCR-09:(.*)', z).groups()
(' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r',)
>>>
Note that your PCR-09:(.*?)\s\r\n
pattern worked because it told Python to get zero or more characters after PCR-09:
and up to \s\r\n
. In other words, get everything between them.
Upvotes: 2