Reputation: 14699
I have a pandas dataframe df with this settings
col1 col2
v1 i1
v1 i50
v2 i60
v2 i1
v2 i8
v10 i8
v10 i1
v10 i2
..
I would like to compute how many elments of col1 has a value of col2. And store the results into a dataframe with this setting
col1 frequency
i1 80
i2 195
... ...
I tried to do this in pandas,
item_frequency = pd.unique(relevant_data[relevant_data['col2'].isin(pd.unique(relevant_data['col2'].values.ravel()))]['col1'].values.ravel())
which is yielding the error
raise ValueError('Lengths must match to compare')
ValueError: Lengths must match to compare
PS: I'd like to do this in a vectorized manner.
Upvotes: 2
Views: 443
Reputation: 117550
It's not quite clear what result you want to get, so if you want to col1, col2, frequency
- then you can use groupby()
and size()
:
In [5]: df.groupby(['col1', 'col2']).size()
Out[5]:
col1 col2
v1 i1 1
i50 1
v10 i1 1
i2 1
i8 1
v2 i1 1
i60 1
i8 1
If you want just calculate count of col2
, then value_counts()
will work:
In [6]: df['col2'].value_counts()
Out[6]:
i1 3
i8 2
i60 1
i2 1
i50 1
dtype: int64
update
After you updated your description, I see that value_counts()
could give you wrong answer if it's possible to have one value more than once per transasction. But you can solve this with drop_duplicates()
:
In [9]: df.drop_duplicates()['col2'].value_counts()
Out[9]:
i1 3
i8 2
i60 1
i2 1
i50 1
dtype: int64
Upvotes: 1