Reputation: 43189
Let's say, I have the following list of tuples
[('FRG', 'MCO TPA PIE SRQ', 'WAVEY EMJAY J174 SWL CEBEE '),
(' ', 'FMY RSW APF', 'WETRO DIW AR22 JORAY HILEY4'),
(' ', 'FMY RSW APF', 'WETRO DIW AR22 JORAY HILEY4')
('FRG2', 'MCO TPA PIE SRQ', 'WAVEY EMJAY J174 SWL CEBEE '),
(' ', 'FMY RSW APF', 'WETRO DIW AR22 JORAY HILEY4')]
How do I group these to have a a dict in the end like:
{'FRG': ['MCO TPA PIE SRQ', 'WAVEY EMJAY J174 SWL CEBEE ', 'FMY RSW APF', 'WETRO DIW AR22 JORAY HILEY4', 'FMY RSW APF', 'WETRO DIW AR22 JORAY HILEY4'],
'FRG2': ...}
That is to say, I'd like to glue together the parts where the tuple[0]
is a word with the (potentially numerous) following parts where the tuple[0]
is empty (contains only whitespaces).
I was experimenting with groupby
and takewhile
from itertools
but haven't reached any working solution. Ideally, the solution contains one of these (for learning purposes, that is).
Upvotes: 2
Views: 761
Reputation: 152795
The functions groupby
and takewhile
aren't good fits for this sort of problem.
groupby
groupby
groups based on a key
function. That means you need to keep the last encountered first non whitespace tuple element to make it work. That means you keep some global state around. By keeping such a state the function is said to be "unpure" while most (or even all) itertools are pure functions.
from itertools import groupby, chain
d = [('FRG', 'MCO TPA PIE SRQ', 'WAVEY EMJAY J174 SWL CEBEE '),
(' ', 'FMY RSW APF', 'WETRO DIW AR22 JORAY HILEY4'),
(' ', 'FMY RSW APF', 'WETRO DIW AR22 JORAY HILEY4'),
('FRG2', 'MCO TPA PIE SRQ', 'WAVEY EMJAY J174 SWL CEBEE '),
(' ', 'FMY RSW APF', 'WETRO DIW AR22 JORAY HILEY4')]
def keyfunc(item):
first = item[0]
if first.strip():
keyfunc.state = first
return keyfunc.state
{k: [item for idx, item in enumerate(chain.from_iterable(grp)) if idx%3 != 0] for k, grp in groupby(d, keyfunc)}
takewhile
takewhile
needs to look ahead to determine when to stop yield
ing values. That means it will automatically pop one value more from the iterator than actually used for each group. To actually apply it you would need to remember the last position and then create a new iterator each time. It also has the problem that you would need to keep some sort of state because you want to take one element with not-space first element and then the ones that have an space-only first element.
One approach could look like this (but feels unnecessarily complicated):
from itertools import takewhile, islice
def takegen(inp):
idx = 0
length = len(inp)
while idx < length:
first, *rest = inp[idx]
rest = list(rest)
for _, *lasts in takewhile(lambda x: not x[0].strip(), islice(inp, idx+1, None)):
rest.extend(lasts)
idx += len(rest) // 2
yield first, rest
dict(takegen(d))
You could simply create your own generator to make this quite easy. It's a variation of the takewhile
approach but it doesn't need external state, islice
, takewhile
, groupby
or that one keeps track of the index:
def gen(inp):
# Initial values
last = None
for first, *rest in inp:
if last is None: # first encountered item
last = first
l = list(rest)
elif first.strip(): # when the first tuple item isn't all whitespaces
# Yield the last "group"
yield last, l
# New values for the next "group"
last = first
l = list(rest)
else: # when the first tuple item is all whitespaces
l.extend(rest)
# Yield the last group
yield last, l
dict(gen(d))
# {'FRG2': ['MCO TPA PIE SRQ', 'WAVEY EMJAY J174 SWL CEBEE ', 'FMY RSW APF', 'WETRO DIW AR22 JORAY HILEY4'],
# 'FRG': ['MCO TPA PIE SRQ', 'WAVEY EMJAY J174 SWL CEBEE ', 'FMY RSW APF', 'WETRO DIW AR22 JORAY HILEY4', 'FMY RSW APF', 'WETRO DIW AR22 JORAY HILEY4']}
Upvotes: 1
Reputation: 150128
Not that I recommend it, but to use itertools.groupby()
for this, you'd need a key function that remembers the last used key. Something like this:
def keyfunc(item, keys=[None]):
if item[0] != keys[-1] and not item[0].startswith(" "):
keys.append(item[0])
return keys[-1]
d = {k: [y for x in g for y in x[1:]] for k, g in groupby(lst, key=keyfunc)}
A simple for
loop looks cleaner and doesn't requre any import
s:
d, key = {}, None
for item in lst:
if item[0] != key and not item[0].startswith(" "):
key = item[0]
d.setdefault(key, []).extend(item[1:])
Upvotes: 3
Reputation: 92894
The solution using collections.defaultdict subclass:
l = [('FRG', 'MCO TPA PIE SRQ', 'WAVEY EMJAY J174 SWL CEBEE '),
(' ', 'FMY RSW APF', 'WETRO DIW AR22 JORAY HILEY4'),
(' ', 'FMY RSW APF', 'WETRO DIW AR22 JORAY HILEY4'),
('FRG2', 'MCO TPA PIE SRQ', 'WAVEY EMJAY J174 SWL CEBEE '),
(' ', 'FMY RSW APF', 'WETRO DIW AR22 JORAY HILEY4')]
d = collections.defaultdict(list)
k = ''
for t in l:
if t[0].strip(): # if the 1st value of a tuple is not empty
k = t[0] # capturing dict key
if k:
d[k].append(t[1])
d[k].append(t[2])
print(dict(d))
The output:
{'FRG2': ['MCO TPA PIE SRQ', 'WAVEY EMJAY J174 SWL CEBEE ', 'FMY RSW APF', 'WETRO DIW AR22 JORAY HILEY4'], 'FRG': ['MCO TPA PIE SRQ', 'WAVEY EMJAY J174 SWL CEBEE ', 'FMY RSW APF', 'WETRO DIW AR22 JORAY HILEY4', 'FMY RSW APF', 'WETRO DIW AR22 JORAY HILEY4']}
Upvotes: 2