Reputation: 113
I have a Pandas df (See below), I want to sum the values based on the index column. My index column contains string values. See the example below, here I am trying to add Moving, Playing and Using Phone together as "Active Time" and sum their corresponding values, while keep the other index values as these are already are. Any suggestions, that how can I work with this type of scenario?
**Activity AverageTime**
Moving 0.000804367
Playing 0.001191772
Stationary 0.320701558
Using Phone 0.594305473
Unknown 0.060697612
Idle 0.022299218
Upvotes: 1
Views: 8602
Reputation: 36545
I would add a new boolean column called "active" and then groupby
that column:
df['active']=False
df['active'][['Moving','Playing','Using Phone']] = True
df.groupby('active').AverageTime.sum()
Upvotes: 0
Reputation: 1222
I am sure that there must be a simpler way of doing this, but here is one possible solution.
# Filters for active and inactive rows
active_row_names = ['Moving','Playing','Using Phone']
active_filter = [row in active_row_names for row in df.index]
inactive_filter = [not row for row in active_filter]
active = df.loc[active_filter].sum() # Sum of 'active' rows as a Series
active = pd.DataFrame(active).transpose() # as a dataframe, and fix orientation
active.index=["active"] # Assign new index name
# Keep the inactive rows as they are, and replace the active rows with the
# newly defined row that is the sum of the previous active rows.
df = df.loc[inactive_filter].append(active, ignore_index=False)
OUTPUT
Activity AverageTime
Stationary 0.320702
Unknown 0.060698
Idle 0.022299
active 0.596302
This will work even when only a subset of the active row names are present in the dataframe.
Upvotes: 3