prj
prj

Reputation: 51

count operations grouped by country, return dataframe in python

Data:

country operations
India A
Malaysia B
Croatia C
India C
India C
Malaysia D
Malaysia A

Desired Output:

{ "India" :{"A":1,"C":2},"Malaysia":{"B":1,"A":1,"D":1},"Croatia":{"C":1}}

I have tried :


arrays = [countrylist, opslist]

index = pd.MultiIndex.from_arrays(arrays, names=('Country', 'Ops'))

df=pd.DataFrame(index)

count = list(df[0].value_counts())

clist = list(df[0].unique())

csdict = dict()

for country,service in clist: 

csdict.setdefault(country, []).append(service) 

country_list = list(csdict.keys())

service_list = list(csdict.values())

fdict = { "country" : country_list, "services" : service_list}

dataf = pd.DataFrame(fdict)

Upvotes: 2

Views: 327

Answers (2)

Red
Red

Reputation: 27577

Here is how you can use the built-in zip() method:

z = list(zip(df.country, df.operations))

output = dict()
for c, o in z:
    output[c] = output.get(c) or dict()
    output[c][o] = z.count((c, o))
print(output)

Output:

{'India': {'A': 1, 'C': 2}, 'Malaysia': {'B': 1, 'D': 1, 'A': 1}, 'Croatia': {'C': 1}}

Upvotes: 1

jezrael
jezrael

Reputation: 863501

Use dictionary comprehension with Series.value_counts per groups:

d = {k: v.value_counts(sort=False).to_dict() 
         for k, v in df.groupby('country', sort=False)['operations']}

print (d)
{'India': {'A': 1, 'C': 2}, 'Malaysia': {'B': 1, 'A': 1, 'D': 1}, 'Croatia': {'C': 1}}

Upvotes: 1

Related Questions