ygoncho
ygoncho

Reputation: 367

Python Regex: Count appearances

I'm trying to parse a data file with regular expressions. The files structure is as follows, for example:

[foo1.uA]
[foo1.uA]
[foo1.uB]
[foo1.uA foo1.uB]
[foo1.uA foo1.uD]
[foo1.uD foo1.uA]
[foo1.uA foo1.uB foo1.uD]

in this example the required result is:

Only uA = 2
Only uB = 1
uA and uB = 1
uA and uD = 2
uA, uB, uD = 1

For starters I have a variable for all possible formations, but I am not sure how this can/should be parsed using regex. Any assistance would be appreciated, thanks!

Clarification: What I tried to do was use re.search:

matchLine = re.search(r'foo1.uA', line, re.I|re.S)
if (matchLine):
    relevantLines.append(line)

But then I don't know how to separate the different possibilities - for appearances that are only with uA, or with uB, or with more than 2.

Upvotes: 0

Views: 101

Answers (2)

JuniorCompressor
JuniorCompressor

Reputation: 20015

You can use a combination of a counter and a regular expression:

l = [
    "foo1.uA",
    "foo1.uA",
    "foo1.uB",
    "foo1.uA foo1.uB",
    "foo1.uA foo1.uD",
    "foo1.uD foo1.uA",
    "foo1.uA foo1.uB foo1.uD"
]

import re
from collections import Counter
c = Counter(frozenset(re.compile(r"foo1\.u.").findall(s)) for s in l)

Result:

>>> c
Counter({frozenset(['foo1.uA', 'foo1.uD']): 2, frozenset(['foo1.uA']): 2, frozenset(['foo1.uA', 'foo1.uB', 'foo1.uD']): 1, frozenset(['foo1.uB']): 1, frozenset(['foo1.uA', 'foo1.uB']): 1})

Upvotes: 2

Daniel
Daniel

Reputation: 42758

Regular expressions are for pattern matching not for counting.

One would use python string operations:

from collections import Counter

def parse_lines(lines):
    for line in lines:
        yield tuple(line.strip()[1:-1].split())

def main():
    with open(filename) as lines:
        result = Counter(parse_lines(lines))
    for key, cnt in result.items():
        print key, '=', cnt

Upvotes: 2

Related Questions