Reputation: 854
I am trying to obtain a frequency count of key-value pairs from a Pandas data frame, using column names as keys.
Although there are several related questions, none really helped me achieve what I wanted, so I wrote a nested for loop to achieve my goal:
#first get all key value pairs
d = {}
for var in list(df.columns.values):
d[var] = df[var].unique().tolist()
#then count
d_num = {}
for k,l in d.iteritems():
for v in l:
d_num[(k+'_'+str(v))] = len(df[df[k]==v])
freqs = Counter(d_num.values()).most_common()
Obviously, this is slow, but I can't think of a way to do it using vectorized methods. Any ideas?
Edit: Here is a sample input:
Name Animal Legs Color
0 Foo Dog 4 Brown
1 Bar Cat 4 White
2 Baz Cat 4 Black
3 Foo Bird 2 Black
4 Foo Dog 4 White
Output (not necessarily a dictionary, and not necessarily with that kind of keys):
out = {Name_Foo: 3, Name_Bar:1, Name_Baz:1, Animal_Dog:2, Animal_Cat:2, Animal_Bird:1, Legs_2:1, Legs_4:4, Color_Brown:1, Color_White:2, Color_Black:2}
Upvotes: 3
Views: 3589
Reputation: 394159
You can use value_counts
for each column and assign to this a dict to produce a dict of Series:
In [19]:
temp={}
for col in df:
temp[col] = df[col].value_counts()
temp
Out[19]:
{'Animal': Dog 2
Cat 2
Bird 1
dtype: int64, 'Name': Foo 3
Baz 1
Bar 1
dtype: int64, 'Legs': 4 4
2 1
dtype: int64, 'Color': Black 2
White 2
Brown 1
dtype: int64}
In [21]:
temp['Animal']
Out[21]:
Dog 2
Cat 2
Bird 1
dtype: int64
If you then wanted to access the 'Dog' count then temp['Animal']['Dog']
would output 2
Upvotes: 4