Reputation: 3014
Trying to find the best way to aggregate values (value pairs) from a list in python.
foo = [
{'color': 'yellow', 'type': 'foo'},
{'color': 'yellow', 'type': 'bar'},
{'color': 'red', 'type': 'foo'},
{'color': 'red', 'type': 'foo'},
{'color': 'green', 'type': 'foo'},
{'color': 'red', 'type': 'bar'}
]
end goal is something like
newFoo = [
{'color': 'yellow', 'type': 'foo', 'count': 1},
{'color': 'yellow', 'type': 'bar', 'count': 1},
{'color': 'red', 'type': 'foo', 'count': 2},
{'color': 'red', 'type': 'bar', 'count': 1},
{'color': 'green', 'type': 'foo', 'count': 1}
]
I'm not very good with python but have been trying to accomplish it sort of but this is about as far as I can get:
def loop(ar):
dik = []
for line in ar:
blah = []
for k,v in line.items():
blah.append({k,v})
blah.append({'count':'1'})
dik.append(blah)
print(dik)
any help appreciated.
Upvotes: 1
Views: 211
Reputation: 3826
I tried to work with your original code as much as I could. What I added was to sort things and then track whether each item matched the preceding one.
`# list sort/count routine`
def loop(ar):
dik = []
ar.sort() #this way we need only check the preceding one for a repeat
#it does give the list sorted, which we believe is harmless
blah={'color': '', 'type': '', 'count':0} #initialize blah to something that will not match
for line in ar:
if (blah['color']==line['color'])and (blah['type']==line['type']):
blah['count']=blah['count']+1 #still accumulating count in blah
else:#first of this one
if (blah['color'])!='':#add previous one, if any
dik.append(blah)
blah={'color': line['color'], 'type': line['type'], 'count':1}
if (blah['color'])!='':#add the last one
dik.append(blah)
return dik
foo = [
{'color': 'yellow', 'type': 'foo'},
{'color': 'yellow', 'type': 'bar'},
{'color': 'red', 'type': 'foo'},
{'color': 'red', 'type': 'foo'},
{'color': 'green', 'type': 'foo'},
{'color': 'red', 'type': 'bar'}
]
newFoo = loop(foo)
print newFoo`
Upvotes: 0
Reputation: 1734
Haha, this took me longer that I want to admit and there is a lot of better answers, but I did this in an old-fashion way and maybe this helps you understand how to achieve it without fancy libraries.
# You clone the list before making any checks,
# because you can't iterate an empty list.
new_foo = foo
for old in foo: # for each item in the old list
for new in new_foo: # we make a check to find that item in the new one
if old['type'] == new['type'] and old['color'] == new['color']: # and if those 2 keys match
if not 'count' in new: # we try to find the count key
new['count'] = 1 # add it if it wasn't found
else:
new['count'] = new['count'] + 1 # sum 1 if it was found
break # and then stop looking, break the 2nd loop.
That should add counts on every item that we want to count. However, it leaves the repeated ones without a count key.
{'color': 'yellow', 'type': 'foo', 'count': 1}
{'color': 'yellow', 'type': 'bar', 'count': 1}
{'color': 'red', 'type': 'foo', 'count': 2}
{'color': 'red', 'type': 'foo'}
{'color': 'green', 'type': 'foo', 'count': 1}
{'color': 'red', 'type': 'bar', 'count': 1}
As we cloned the list as first thing, sadly those still exist in our new list so let's use that to filter them out.
for item in new_foo:
if not 'count' in item:
new_foo.remove(item)
Result:
{'color': 'yellow', 'type': 'foo', 'count': 1}
{'color': 'yellow', 'type': 'bar', 'count': 1}
{'color': 'red', 'type': 'foo', 'count': 2}
{'color': 'green', 'type': 'foo', 'count': 1}
{'color': 'red', 'type': 'bar', 'count': 1}
I am aware that there are better answers, but I think understanding the basics is important before dealing with advanced technics. We can check keys in dicts and add a key to a dict easily this way:
if 'my_made_up_key' in my_dict: # check if exists
my_dict['my_made_up_key'] = my_value # add new key to a dict
Upvotes: 0
Reputation: 195448
You can use Counter
from collections
:
from collections import Counter
from pprint import pprint
foo = [
{'color': 'yellow', 'type': 'foo'},
{'color': 'yellow', 'type': 'bar'},
{'color': 'red', 'type': 'foo'},
{'color': 'red', 'type': 'foo'},
{'color': 'green', 'type': 'foo'},
{'color': 'red', 'type': 'bar'}
]
c = Counter( tuple( (i['color'], i['type']) for i in foo))
pprint([{'color': k[0], 'type': k[1], 'count': v} for k, v in c.items()])
Output:
[{'color': 'yellow', 'count': 1, 'type': 'foo'},
{'color': 'yellow', 'count': 1, 'type': 'bar'},
{'color': 'red', 'count': 2, 'type': 'foo'},
{'color': 'green', 'count': 1, 'type': 'foo'},
{'color': 'red', 'count': 1, 'type': 'bar'}]
Edit:
If you want to sort the new list, you can do something like this:
l = sorted(newFoo, key=lambda v: (v['color'], v['type']), reverse=True)
pprint(l)
Will print:
[{'color': 'yellow', 'count': 1, 'type': 'foo'},
{'color': 'yellow', 'count': 1, 'type': 'bar'},
{'color': 'red', 'count': 2, 'type': 'foo'},
{'color': 'red', 'count': 1, 'type': 'bar'},
{'color': 'green', 'count': 1, 'type': 'foo'}]
Edit:
Thanks to @MadPhysicist, you can generalize the above example:
c = Counter(tuple(item for item in i.items()) for i in foo)
pprint([{**dict(k), 'count': v} for k, v in c.items()])
Upvotes: 2
Reputation: 3195
Here's an easy option if you don't mind duplicates. If you want only one record, Andrej's answer with Counter
is great.
newFoo = [dict(d, **{'count': foo.count(d)}) for d in foo]
>>> newFoo
[{'color': 'yellow', 'type': 'foo', 'count': 1},
{'color': 'yellow', 'type': 'bar', 'count': 1},
{'color': 'red', 'type': 'foo', 'count': 2},
{'color': 'red', 'type': 'foo', 'count': 2},
{'color': 'green', 'type': 'foo', 'count': 1},
{'color': 'red', 'type': 'bar', 'count': 1}]
Upvotes: 1