Tim
Tim

Reputation: 321

pandas sum particular rows conditional on values in different rows

I have a data frame like this.

enter image description here

The rows are grouped into 5 rows at a time. The first row of the first group tells me whether to include the whole following 4 rows according to Field A. For example, the yellow is included, the blue is not, simply because the first row of each tells me.

I want to sum fieldB if the section has FieldA true in the first row. In this example, I want to sum over the yellow section, because the first row for that section has TRUE in fieldA.

I can think of two apoproaches to do this, but don't know how to code it:

  1. Update the remainder of Field A first with TRUE if the first of the 5 rows is true. But I don't know how to do this.

  2. Have a filter that is based on on the row itself, but the row of the header. Again, I don't know how to do this.

Upvotes: 0

Views: 56

Answers (1)

iMS44
iMS44

Reputation: 113


These is solution based on your option 1 suggestion:

# Import pandas
import pandas as pd
import numpy as np

# Sample df
d = {'FieldA': [True, '', '', '', '', False, '', '', '', ''],'FieldB': [1, 2, 1, 4, 6, 5, 7, 9, 0, 1], 'FieldC': [0.3, 0.2, 0.3, 0.2, 0.2, 0.3, 0.2, 0.3, 0.2, 0.2]}
df = pd.DataFrame(data=d)

# Create temporaty column to find index distance from last True/False
t_mod = []
for i in list(df.index.values):
    t_mod.append(i%5)

df['t_mod_c'] = np.array(t_mod)

Output

 # Add missinf True/False values to FieldA based in column t_mod_c
test = []
for i in df.index.values:
    test.append(df['FieldA'].loc[i-df['t_mod_c'].loc[i]])
df.drop(['t_mod_c'], axis=1, inplace = True)
df['FieldA'] = np.array(test)
df

enter image description here

# Sum FieldB based on FieldA value
df[df['FieldA'] == True]['FieldB'].sum()

Hope it helps!
If you have ony questions let me know.
Good luck!

Upvotes: 1

Related Questions