Reputation: 37
I have used the code below to group my Pandas Dataframe based on Hourly Rate, and Hourly Rate Quartile.
e = df.groupby(['Hourly Rate Quartile', 'Hourly Rate']).size().reset_index(name='Count')
print(e)
This prints out my three columns.
I now want to filter through these results and print only those that have Count>1.
I have tried in many different ways:
if e.loc[e['Count']] > 1:
print (e)
Before that, I also used:
if e['Count'] > 1:
print (e)
In both cases, I get a ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I tried a For loop as well.
for i in e['Count']:
if i>1:
print(i)
Which gives me the right results but I would like to get all three columns.
So, when I try:
for i in e['Count']:
if i>1:
print(e)
It prints everything again.
This is the last thing I've tried:
for i in e:
if i['Count']>1:
print(i)
Which gives me this error: string indices must be integers.
Do you guys have any ideas?
Upvotes: 0
Views: 51
Reputation: 675
In [1]: df = pd.DataFrame({'c1': list("aacd"), 'c2': list("bbcd")})
In [2]: df
Out[2]:
c1 c2
0 a b
1 a b
2 c c
3 d d
In [3]: series = df.groupby(['c1', 'c2']).size()
In[4]: series
Out[4]:
c1 c2
a b 2
c c 1
d d 1
dtype: int64
In [5]: series[series > 1]
Out[5]:
c1 c2
a b 2
dtype: int64
Upvotes: 1
Reputation: 898
import pandas as pd
import numpy as np
df = pd.DataFrame([['A', 5],
['A', 4.],
['B', 1],
['B', 2]], columns=['col1', 'col2'])
df = pd.merge(df,
(df
.groupby('col1')
.count()
.reset_index()
.rename(columns={'col2': 'count'})),
how='left',
on='col1')
xx = df.loc[df['count'] > 1]
Upvotes: 0