Neil
Neil

Reputation: 8247

how to groupby and filter in pandas

I have following dataframe in pandas

   from_exp     to_exp     salary
   5            7          10to12
   5            7          15to20
   0            3          1to3
   2            5          4to6
   5            9          7to8
   5            10         10to12

I want to see whar are the different salary ranges for from_exp is 5 years.

Desired dataframe

   from_exp       salary_range      count
   5              10to12            2       
   5              15to20            1
   5              7to8              1

I am doing following in pandas but it is not working.

df.groupby(['from_exp','salary_range']).filter(
           lambda x : x['from_exp'] == '5'
           )['salary'].size()

Upvotes: 2

Views: 164

Answers (3)

WebDev
WebDev

Reputation: 1371

You can do this to get your desired output:

df = df[df.from_exp == 5].groupby(['from_exp','salary']).agg({'salary':'count'}).rename(columns={'salary':'count'})
df = df.reset_index().rename(columns={'salary':'salary_range'})

print(df)

The output is:

   from_exp salary_range  count
0       5.0       10to12      2
1       5.0       15to20      1
2       5.0         7to8      1

Upvotes: 0

Andy L.
Andy L.

Reputation: 25239

I use this way to name the column as count

df.groupby(['from_exp','salary']).to_exp.count().loc[(5), :].reset_index(name='count')


Out[445]:
   from_exp  salary  count
0         5  10to12      2
1         5  15to20      1
2         5    7to8      1

Upvotes: 1

cs95
cs95

Reputation: 402523

You'll need to filter first, then group. You can use query to do everything in a single line:

(df.query('from_exp == 5')
   .groupby(['from_exp', 'salary'], as_index=False)
   .count())

   from_exp  salary  to_exp
0         5  10to12       2
1         5  15to20       1
2         5    7to8       1

Grouping on "from_exp" here is unnecessary because of the pre-filtering step, but I've done it here to reproduce your expected output.

If you're not particular on "from_exp" as the first column, use value_counts instead:

(df.query('from_exp == 5')['salary']
   .value_counts()
   .reset_index()
   .set_axis(['salary_range', 'count'], axis=1, inplace=False))

  salary_range  count
0       10to12      2
1       15to20      1
2         7to8      1

Upvotes: 2

Related Questions