Reputation: 5235
I know how to get the most frequent element of list of list, e.g.
a = [[3,4], [3,4],[3,4], [1,2], [1,2], [1,1],[1,3],[2,2],[3,2]]
print max(a, key=a.count)
should print [3, 4]
even though the most frequent number is 1
for the first element and 2
for the second element.
My question is how to do the same kind of thing with Pandas.DataFrame.
For example, I'd like to know the implementation of the following method get_max_freq_elem_of_df
:
def get_max_freq_elem_of_df(df):
# do some things
return freq_list
df = pd.DataFrame([[3,4], [3,4],[3,4], [1,2], [1,2], [1,1],[1,3],[2,2],[4,2]])
x = get_max_freq_elem_of_df(df)
print x # => should print [3,4]
Please notice that DataFrame.mode() method does not work. For above example, df.mode()
returns [1, 2]
not [3,4]
have explained why DataFrame.mode() doesn't work.
Upvotes: 1
Views: 1249
Reputation: 375415
In python you'd use Counter*:
In [11]: from collections import Counter
In [12]: c = Counter(df.itertuples(index=False))
In [13]: c
Out[13]: Counter({(3, 4): 3, (1, 2): 2, (1, 3): 1, (2, 2): 1, (4, 2): 1, (1, 1): 1})
In [14]: c.most_common(1) # get the top 1 most common items
Out[14]: [((3, 4), 3)]
In [15]: c.most_common(1)[0][0] # get the item (rather than the (item, count) tuple)
Out[15]: (3, 4)
* Note that your solution
max(a, key=a.count)
(although it works) is O(N^2), since on each iteration it needs to iterate through a
(to get the count), whereas Counter is O(N).
Upvotes: 2
Reputation: 353009
You could use groupby.size
and then find the max:
>>> df.groupby([0,1]).size()
0 1
1 1 1
2 2
3 1
2 2 1
3 4 3
4 2 1
dtype: int64
>>> df.groupby([0,1]).size().idxmax()
(3, 4)
Upvotes: 3