Reputation: 367
I'm trying to parse a data file with regular expressions. The files structure is as follows, for example:
[foo1.uA]
[foo1.uA]
[foo1.uB]
[foo1.uA foo1.uB]
[foo1.uA foo1.uD]
[foo1.uD foo1.uA]
[foo1.uA foo1.uB foo1.uD]
in this example the required result is:
Only uA = 2
Only uB = 1
uA and uB = 1
uA and uD = 2
uA, uB, uD = 1
For starters I have a variable for all possible formations, but I am not sure how this can/should be parsed using regex. Any assistance would be appreciated, thanks!
Clarification: What I tried to do was use re.search:
matchLine = re.search(r'foo1.uA', line, re.I|re.S)
if (matchLine):
relevantLines.append(line)
But then I don't know how to separate the different possibilities - for appearances that are only with uA, or with uB, or with more than 2.
Upvotes: 0
Views: 101
Reputation: 20015
You can use a combination of a counter and a regular expression:
l = [
"foo1.uA",
"foo1.uA",
"foo1.uB",
"foo1.uA foo1.uB",
"foo1.uA foo1.uD",
"foo1.uD foo1.uA",
"foo1.uA foo1.uB foo1.uD"
]
import re
from collections import Counter
c = Counter(frozenset(re.compile(r"foo1\.u.").findall(s)) for s in l)
Result:
>>> c
Counter({frozenset(['foo1.uA', 'foo1.uD']): 2, frozenset(['foo1.uA']): 2, frozenset(['foo1.uA', 'foo1.uB', 'foo1.uD']): 1, frozenset(['foo1.uB']): 1, frozenset(['foo1.uA', 'foo1.uB']): 1})
Upvotes: 2
Reputation: 42758
Regular expressions are for pattern matching not for counting.
One would use python string operations:
from collections import Counter
def parse_lines(lines):
for line in lines:
yield tuple(line.strip()[1:-1].split())
def main():
with open(filename) as lines:
result = Counter(parse_lines(lines))
for key, cnt in result.items():
print key, '=', cnt
Upvotes: 2