And_Dev
And_Dev

Reputation: 113

Pandas df sum rows based on index column

I have a Pandas df (See below), I want to sum the values based on the index column. My index column contains string values. See the example below, here I am trying to add Moving, Playing and Using Phone together as "Active Time" and sum their corresponding values, while keep the other index values as these are already are. Any suggestions, that how can I work with this type of scenario?

**Activity  AverageTime**
Moving      0.000804367 
Playing     0.001191772 
Stationary  0.320701558 
Using Phone 0.594305473 
Unknown     0.060697612 
Idle        0.022299218 

Upvotes: 1

Views: 8602

Answers (2)

maxymoo
maxymoo

Reputation: 36545

I would add a new boolean column called "active" and then groupby that column:

df['active']=False
df['active'][['Moving','Playing','Using Phone']] = True
df.groupby('active').AverageTime.sum()

Upvotes: 0

ronrest
ronrest

Reputation: 1222

I am sure that there must be a simpler way of doing this, but here is one possible solution.

# Filters for active and inactive rows
active_row_names = ['Moving','Playing','Using Phone']
active_filter = [row in active_row_names for row in df.index]
inactive_filter = [not row for row in active_filter]

active = df.loc[active_filter].sum()       # Sum of 'active' rows as a Series
active  = pd.DataFrame(active).transpose() # as a dataframe, and fix orientation
active.index=["active"]                    # Assign new index name

# Keep the inactive rows as they are, and replace the active rows with the
# newly defined row that is the sum of the previous active rows.
df = df.loc[inactive_filter].append(active, ignore_index=False)

OUTPUT

Activity       AverageTime
Stationary     0.320702
Unknown        0.060698
Idle           0.022299
active         0.596302

This will work even when only a subset of the active row names are present in the dataframe.

Upvotes: 3

Related Questions