Reputation: 1081
if i have the following python dictionary:
[{'website':'google.com', 'hits': 100, 'source': 'mobile'},
{'website':'facebook.com', 'hits': 100, 'source': 'internet'},
{'website':'google.com', 'hits': 100, 'source': 'internet'},
{'website':'google.com', 'hits': 100, 'source': 'tablet'},
{'website':'youtube.com', 'hits': 100, 'source': 'mobile'},
]
where the values for key 'hits' stays the same (so it never changes will always be 100)
how can i combine the values of key 'source' into a list, but keeping the output as a separate dictionary inside a list
basically to get this output:
[{'website':'google.com', 'hits': 100, 'source': ['mobile', 'internet', 'tablet']},
{'website':'facebook.com', 'hits': 100, 'source': 'internet'},
{'website':'youtube.com', 'hits': 100, 'source': 'mobile'}
]
Upvotes: 1
Views: 190
Reputation: 2136
This can be solved with usage of groupby method of python standard library itertools, here are the docs
import itertools
data = [
{'website':'google.com', 'hits': 100, 'source': 'mobile'},
{'website':'facebook.com', 'hits': 100, 'source': 'internet'},
{'website':'google.com', 'hits': 100, 'source': 'internet'},
{'website':'google.com', 'hits': 100, 'source': 'tablet'},
{'website':'youtube.com', 'hits': 100, 'source': 'mobile'},
]
keyfunc = lambda x: x['website']
# list has to be sorted according to the same key as grouping
data = sorted(data, key=keyfunc)
result = []
# here k - is value of the key upon which this group was grouped
# g - is a group iterator object with elements of original list grouped by key
for k, g in itertools.groupby(data, keyfunc):
# we have to create a list from the g to be able to reuse this list
# when we'll calculate 'hits' and 'source' fields, once g is iterated
# it is exhausted and we won't be able to access its elements
group_list = list(g)
# map 'sources' key values to a list, it can be also done with
# map function: [*map(lambda x: x['source'], group_list)]
grouped_sources = [group_element['source'] for group_element in group_list]
result.append({
'website': k,
'hits': group_list[0]['hits'], # here can be some method for calculating stuff with hits
# if we have only one element in a group pass it as is, otherwise pass list itself
'source': grouped_sources if len(grouped_sources) > 1 else grouped_sources[0]
})
Upvotes: 0
Reputation: 309
import pandas as pd
data = [{'website':'google.com', 'hits': 100, 'source': 'mobile'},
{'website':'facebook.com', 'hits': 100, 'source': 'internet'},
{'website':'google.com', 'hits': 100, 'source': 'internet'},
{'website':'google.com', 'hits': 100, 'source': 'tablet'},
{'website':'youtube.com', 'hits': 100, 'source': 'mobile'}]
df = pd.DataFrame(data) // convert data to pandas dataframe
print(df)
website hits source
0 google.com 100 mobile
1 facebook.com 100 internet
2 google.com 100 internet
3 google.com 100 tablet
4 youtube.com 100 mobile
output = df.groupby(['website', 'hits'])['source'].apply(list).reset_index().to_dict(orient='records')
print(output)
[{'website': 'facebook.com', 'hits': 100, 'source': ['internet']},
{'website': 'google.com', 'hits': 100, 'source': ['mobile', 'internet', 'tablet']},
{'website': 'youtube.com', 'hits': 100, 'source': ['mobile']}]
Upvotes: 0
Reputation: 43
Create a new array, and check if the dictionary exists before adding it.
new = []
for each in initial:
# added enumerate to get index
i, found = list(filter(lambda a:a[1]['website'] ==each['website'], enumerate(new)))
# if new does not have it already
if not len(found) :
new.append(each)
else:
try:
new[i]['source'].append(each['source'])
except:
new[i]['source'] = [new[i]['source'] , each['source'] ]
I wrote this on my phone so there might be some errors. But you get the idea
Upvotes: 1
Reputation: 1620
Using pandas.DataFrame
:
import pandas as pd
data = [
{'website':'google.com', 'hits': 100, 'source': 'mobile'},
{'website':'facebook.com', 'hits': 100, 'source': 'internet'},
{'website':'google.com', 'hits': 100, 'source': 'internet'},
{'website':'google.com', 'hits': 100, 'source': 'tablet'},
{'website':'youtube.com', 'hits': 100, 'source': 'mobile'},
]
df = pd.DataFrame(data)
will create data like this:
> df.head()
website hits source
0 google.com 100 mobile
1 facebook.com 100 internet
2 google.com 100 internet
3 google.com 100 tablet
4 youtube.com 100 mobile
then you can group by source column, and save in your desired format:
new_data = []
for item in df.groupby('website'):
new_data.append({
'website': item[0],
'hits': 100,
'source': list(item[1]['source'])
})
print(new_data)
# [
# {'website': 'facebook.com', 'hits': 100, 'source': ['internet']},
# {'website': 'google.com', 'hits': 100, 'source': ['mobile', 'internet', 'tablet']},
# {'website': 'youtube.com', 'hits': 100, 'source': ['mobile']}
# ]
Upvotes: 2