Compute values frequency in pandas dataframe

Question

I have a pandas dataframe df with this settings

col1 col2
v1   i1
v1   i50
v2   i60
v2   i1
v2   i8 
v10  i8
v10  i1 
v10  i2 
..

I would like to compute how many elments of col1 has a value of col2. And store the results into a dataframe with this setting

col1 frequency
i1   80
i2   195
...  ...

I tried to do this in pandas,

 item_frequency = pd.unique(relevant_data[relevant_data['col2'].isin(pd.unique(relevant_data['col2'].values.ravel()))]['col1'].values.ravel())

which is yielding the error

raise ValueError('Lengths must match to compare')
ValueError: Lengths must match to compare

PS: I'd like to do this in a vectorized manner.

roman · Accepted Answer

It's not quite clear what result you want to get, so if you want to col1, col2, frequency - then you can use groupby() and size():

In [5]: df.groupby(['col1', 'col2']).size()
Out[5]: 
col1  col2
v1    i1      1
      i50     1
v10   i1      1
      i2      1
      i8      1
v2    i1      1
      i60     1
      i8      1

If you want just calculate count of col2, then value_counts() will work:

In [6]: df['col2'].value_counts()
Out[6]: 
i1     3
i8     2
i60    1
i2     1
i50    1
dtype: int64

update

After you updated your description, I see that value_counts() could give you wrong answer if it's possible to have one value more than once per transasction. But you can solve this with drop_duplicates():

In [9]: df.drop_duplicates()['col2'].value_counts()
Out[9]: 
i1     3
i8     2
i60    1
i2     1
i50    1
dtype: int64

Compute values frequency in pandas dataframe

Answers (1)

Related Questions