Beans On Toast
Beans On Toast

Reputation: 1081

merging python list of dictionaries with repeated dictionary keys

if i have the following python dictionary:

[{'website':'google.com', 'hits': 100, 'source': 'mobile'}, 
{'website':'facebook.com', 'hits': 100, 'source': 'internet'},
 {'website':'google.com', 'hits': 100, 'source': 'internet'},
 {'website':'google.com', 'hits': 100, 'source': 'tablet'},
 {'website':'youtube.com', 'hits': 100, 'source': 'mobile'},

]

where the values for key 'hits' stays the same (so it never changes will always be 100)

how can i combine the values of key 'source' into a list, but keeping the output as a separate dictionary inside a list

basically to get this output:

[{'website':'google.com', 'hits': 100, 'source': ['mobile', 'internet', 'tablet']}, 
{'website':'facebook.com', 'hits': 100, 'source': 'internet'},
 {'website':'youtube.com', 'hits': 100, 'source': 'mobile'}
]

Upvotes: 1

Views: 190

Answers (4)

Monsieur Merso
Monsieur Merso

Reputation: 2136

This can be solved with usage of groupby method of python standard library itertools, here are the docs

import itertools

data = [
    {'website':'google.com', 'hits': 100, 'source': 'mobile'},
    {'website':'facebook.com', 'hits': 100, 'source': 'internet'},
    {'website':'google.com', 'hits': 100, 'source': 'internet'},
    {'website':'google.com', 'hits': 100, 'source': 'tablet'},
    {'website':'youtube.com', 'hits': 100, 'source': 'mobile'},
]

keyfunc = lambda x: x['website']
# list has to be sorted according to the same key as grouping
data = sorted(data, key=keyfunc)

result = []
# here k - is value of the key upon which this group was grouped
# g - is a group iterator object with elements of original list grouped by key
for k, g in itertools.groupby(data, keyfunc):
    # we have to create a list from the g to be able to reuse this list
    # when we'll calculate 'hits' and 'source' fields, once g is iterated
    # it is exhausted and we won't be able to access its elements
    group_list = list(g)
    # map 'sources' key values to a list, it can be also done with
    # map function: [*map(lambda x: x['source'], group_list)]
    grouped_sources = [group_element['source'] for group_element in group_list]
    result.append({
        'website': k,
        'hits': group_list[0]['hits'], # here can be some method for calculating stuff with hits
        # if we have only one element in a group pass it as is, otherwise pass list itself
        'source': grouped_sources if len(grouped_sources) > 1 else grouped_sources[0]
    })

Upvotes: 0

Nivedita Deshmukh
Nivedita Deshmukh

Reputation: 309

import pandas as pd

data = [{'website':'google.com', 'hits': 100, 'source': 'mobile'}, 
        {'website':'facebook.com', 'hits': 100, 'source': 'internet'},
        {'website':'google.com', 'hits': 100, 'source': 'internet'},
        {'website':'google.com', 'hits': 100, 'source': 'tablet'},
        {'website':'youtube.com', 'hits': 100, 'source': 'mobile'}]

df = pd.DataFrame(data)     // convert data to pandas dataframe

print(df)

     website       hits  source
0    google.com    100   mobile
1    facebook.com  100   internet
2    google.com    100   internet
3    google.com    100   tablet
4   youtube.com    100   mobile

output = df.groupby(['website', 'hits'])['source'].apply(list).reset_index().to_dict(orient='records')

print(output)
[{'website': 'facebook.com', 'hits': 100, 'source': ['internet']}, 
 {'website': 'google.com', 'hits': 100, 'source': ['mobile', 'internet', 'tablet']},
 {'website': 'youtube.com', 'hits': 100, 'source': ['mobile']}]

Upvotes: 0

foderking
foderking

Reputation: 43

Create a new array, and check if the dictionary exists before adding it.

new = [] 
for each in initial:
  # added enumerate to get index
  i, found = list(filter(lambda a:a[1]['website'] ==each['website'], enumerate(new))) 
  # if new does not have it already 
  if not len(found) :
    new.append(each) 
  else:
    try:
      new[i]['source'].append(each['source']) 
    except:
      new[i]['source'] = [new[i]['source'] , each['source'] ] 

I wrote this on my phone so there might be some errors. But you get the idea

Upvotes: 1

K.Mat
K.Mat

Reputation: 1620

Using pandas.DataFrame:

import pandas as pd

data = [
    {'website':'google.com', 'hits': 100, 'source': 'mobile'}, 
    {'website':'facebook.com', 'hits': 100, 'source': 'internet'},
    {'website':'google.com', 'hits': 100, 'source': 'internet'},
    {'website':'google.com', 'hits': 100, 'source': 'tablet'},
    {'website':'youtube.com', 'hits': 100, 'source': 'mobile'},
]
df = pd.DataFrame(data)

will create data like this:

> df.head()
        website  hits    source
0    google.com   100    mobile
1  facebook.com   100  internet
2    google.com   100  internet
3    google.com   100    tablet
4   youtube.com   100    mobile

then you can group by source column, and save in your desired format:

new_data = []
for item in df.groupby('website'):
    new_data.append({
        'website': item[0],
        'hits': 100,
        'source': list(item[1]['source'])
    })
print(new_data)
# [
#     {'website': 'facebook.com', 'hits': 100, 'source': ['internet']},
#     {'website': 'google.com', 'hits': 100, 'source': ['mobile', 'internet', 'tablet']},
#     {'website': 'youtube.com', 'hits': 100, 'source': ['mobile']}
# ]

Upvotes: 2

Related Questions