Reputation: 12550
I have a defaultdict that looks like this:
d = { 'ID_001': ['A', 'A_part1', 'A_part2'],
'ID_002': ['A', 'A_part3'],
'ID_003': ['B', 'B_part1', 'B_part2', 'A', 'A_part4'],
'ID_004': ['C', 'C_part1', 'A', 'A_part5', 'B', 'B_part3']
}
Before I go any further, I have to say that A_part1
isn't the actual string -- the strings are really a bunch of alphanumeric characters; I represented it as such to show that A_part1
is text that is associated with A
, if you see what I mean.)
Standing back and looking at it, what I really have is a dict where the values have their own key/value relationship, but that relationship exists only in the order they appear in, in the list.
I am attempting to end up with something like this:
['ID_001 A A_part1, A_part2',
'ID_002 A A_part3',
'ID_003 B B_part1 B_part2',
'ID_003 A A_part4',
'ID_004 C C_part1',
'ID_004 A A_part5',
'ID_004 B B_part3']
I have made a variety of attempts; I keep wanting to run through the dict's value, making note of the character in the first position (eg, the A
), and collect values until I find a B
or a C
, then stop collecting. Then append what I have to a list that I have declared elsewhere. Ad nauseum.
I'm running into all sorts of problems, not the least of which is bloated code. I'm missing the ability to iterate through the value in a clean way. Invariably, I seem to run into index errors.
If anyone has any ideas/philosophy/comments I'd be grateful.
Upvotes: 0
Views: 39
Reputation: 353009
Whenever you're trying to do something involving contiguous groups, you should think of itertools.groupby
. You weren't very specific about what condition separates the groups, but if we take "the character in the first position" at face value:
from itertools import groupby
new_list = []
for key, sublist in sorted(d.items()):
for _, group in groupby(sublist, key=lambda x: x[0]):
new_list.append(' '.join([key] + list(group)))
produces
>>> for elem in new_list:
... print(elem)
...
ID_001 A A_part1 A_part2
ID_002 A A_part3
ID_003 B B_part1 B_part2
ID_003 A A_part4
ID_004 C C_part1
ID_004 A A_part5
ID_004 B B_part3
Upvotes: 0
Reputation: 588
May not be in the order you want, but no thanks for further headaches.
d = { 'ID_001': ['A', 'A_part1', 'A_part2'],
'ID_002': ['A', 'A_part3'],
'ID_003': ['B', 'B_part1', 'B_part2', 'A', 'A_part4'],
'ID_004': ['C', 'C_part1', 'A', 'A_part5', 'B', 'B_part3']
}
rst = []
for o in d:
t_d={}
for t_o in d[o]:
if not t_o[0] in t_d:
t_d[t_o[0]] = [t_o]
else: t_d[t_o[0]].append(t_o)
for t_o in t_d:
rst.append(' '.join([o,t_d[t_o][0],', '.join(t_d[t_o][1:])]))
print(rst)
['ID_004 C C_part1', 'ID_004 A A_part5', 'ID_004 B B_part3', 'ID_003 A A_part4', 'ID_003 B B_part1, B_part2', 'ID_002 A A_part3', 'ID_001 A A_part1, A_part2']
Upvotes: 0
Reputation: 30200
What about something like:
d = { 'ID_001': ['A', 'A_part1', 'A_part2'],
'ID_002': ['A', 'A_part3'],
'ID_003': ['B', 'B_part1', 'B_part2', 'A', 'A_part4'],
'ID_004': ['C', 'C_part1', 'A', 'A_part5', 'B', 'B_part3']
}
def is_key(s):
return s in ['A','B','C']
out = {}
for (k,v) in d.iteritems():
key = None
for e in v:
if is_key(e): key = e
else:
out_key = (k,key)
out[out_key] = out.get(out_key, []) + [e]
which generates:
{('ID_001', 'A'): ['A_part1', 'A_part2'],
('ID_002', 'A'): ['A_part3'],
('ID_003', 'A'): ['A_part4'],
('ID_003', 'B'): ['B_part1', 'B_part2'],
('ID_004', 'A'): ['A_part5'],
('ID_004', 'B'): ['B_part3'],
('ID_004', 'C'): ['C_part1']}
It's important that you update the is_key
function to match your actual input.
Also, the variable names are far from optimal, but I'm not really sure what you're doing -- you should be able to (and should) give them more appropriate names.
Upvotes: 1