tryingToDevelopMyself
tryingToDevelopMyself

Reputation: 99

Using range of values in column to summarize values in another column based on condition for first column

I cannot find any explanation about numpy.where function.

I have dataframe:

     activity     count    
0       1           25   
1       2           38   
2       3           49   
3       4           55                       
4       5           61   
5       6           74   
6       7           82  
7       8           98   
8       9           103   
9       10          4084    

and I want to count values:

  1. for rows where activity < 6, sum count (so for index < 5)
  2. for rows where activity =< 6, sum count ( so for index =< 5)

and crate a charts for this two condition which shows sum count values. I have tried:

under_total = df[df.activity < 6]
under_total.loc['under_total'] = under_total.sum(axis = 0)

OR

df['below_expectations'] = np.where(df['actibity'] < 6, df['activity'], df['hoursActive'])

But it seems I dont get it somewhere.

Upvotes: 1

Views: 214

Answers (2)

jezrael
jezrael

Reputation: 863116

Use DataFrame.assign for new column by np.where and then aggregate sum:

s = (df.assign(cat = np.where(df['activity'] < 6, 'less 6', 'higher 6'))
       .groupby('cat')['count']
       .sum())
print (s)
cat
higher 6    4441
less 6       228
Name: count, dtype: int64

and then plot by Series.plot.bar:

s.plot.bar()

img

Upvotes: 0

mozway
mozway

Reputation: 261590

You can compute a group for GroupBy.sum to feed it to plot.bar:

(df.groupby(df['activity'].lt(6)
            .map({True: '<6', False: '≥6'}))
 ['count'].sum().plot.bar()
)

or with pandas.cut, which gives you the advantage to have as many categories as you want:

(df.groupby(pd.cut(df['activity'], bins=[0,6,float('inf')],
                   labels=['<6', '≥6'], right=False))
 ['count'].sum().plot.bar()
)

output:

enter image description here

intermediate data:

(df.groupby(df['activity'].lt(6)
            .map({True: '<6', False: '≥6'}))
 ['count'].sum()
)

output:

activity
<6     228
≥6    4441
Name: count, dtype: int64

Upvotes: 1

Related Questions