Farnaz
Farnaz

Reputation: 19

how to apply a groupby on list of tuples in python?

In my function I will create different tuples and add to an empty list :

tup = (pattern,matchedsen)
matchedtuples.append(tup)

The patterns have format of regular expressions. I am looking for apply groupby() on matchedtuples in following way:

For example :

matchedtuples = [(p1, s1) , (p1,s2) , (p2, s5)]

And I am looking for this result:

result = [ (p1,(s1,s2)) , (p2, s5)]

So, in this way I will have groups of sentences with the same pattern. How can I do this?

Upvotes: 0

Views: 800

Answers (2)

Chiheb Nexus
Chiheb Nexus

Reputation: 9257

My answer for your question will work for any input structure you will use and print the same output as you gave. And i will use only groupby from itertools module:

# Let's suppose your input is something like this
a = [("p1", "s1"), ("p1", "s2"), ("p2", "s5")]

from itertools import groupby

result = []

for key, values in groupby(a, lambda x : x[0]):
    b = tuple(values)
    if len(b) >= 2:
        result.append((key, tuple(j[1] for j in b)))
    else:
        result.append(tuple(j for j in b)[0])

print(result)

Output:

[('p1', ('s1', 's2')), ('p2', 's5')]

The same solution work if you add more values to your input:

# When you add more values to your input
a = [("p1", "s1"), ("p1", "s2"), ("p2", "s5"), ("p2", "s6"), ("p3", "s7")]

from itertools import groupby

result = []

for key, values in groupby(a, lambda x : x[0]):
    b = tuple(values)
    if len(b) >= 2:
        result.append((key, tuple(j[1] for j in b)))
    else:
        result.append(tuple(j for j in b)[0])

print(result)

Output:

[('p1', ('s1', 's2')), ('p2', ('s5', 's6')), ('p3', 's7')]

Now, if you modify your input structure:

# Let's suppose your modified input is something like this
a = [(["p1"], ["s1"]), (["p1"], ["s2"]), (["p2"], ["s5"])]

from itertools import groupby

result = []

for key, values in groupby(a, lambda x : x[0]):
    b = tuple(values)
    if len(b) >= 2:
        result.append((key, tuple(j[1] for j in b)))
    else:
        result.append(tuple(j for j in b)[0])

print(result)

Output:

[(['p1'], (['s1'], ['s2'])), (['p2'], ['s5'])]

Also, the same solution work if you add more values to your new input structure:

# When you add more values to your new input
a = [(["p1"], ["s1"]), (["p1"], ["s2"]), (["p2"], ["s5"]), (["p2"], ["s6"]), (["p3"], ["s7"])]

from itertools import groupby

result = []

for key, values in groupby(a, lambda x : x[0]):
    b = tuple(values)
    if len(b) >= 2:
        result.append((key, tuple(j[1] for j in b)))
    else:
        result.append(tuple(j for j in b)[0])

print(result)

Output:

[(['p1'], (['s1'], ['s2'])), (['p2'], (['s5'], ['s6'])), (['p3'], ['s7'])]

Ps: Test this code and if it breaks with any other kind of inputs please let me know.

Upvotes: 1

Dimitris Fasarakis Hilliard
Dimitris Fasarakis Hilliard

Reputation: 160447

If you require the output you present, you'll need to manually loop through the grouping of matchedtuples and build your list.

First, of course, if the matchedtuples list isn't sorted, sort it with itemgetter:

from operator import itemgetter as itmg

li = sorted(matchedtuples, key=itmg(0))

Then, loop through the result supplied by groupby and append to the list r based on the size of the group:

r = []
for i, j in groupby(matchedtuples, key=itmg(0)):
    j = list(j)
    ap = (i, j[0][1]) if len(j) == 1 else (i, tuple(s[1] for s in j))
    r.append(ap)

Upvotes: 0

Related Questions