boberdorf
boberdorf

Reputation: 133

Iterate over Pandas dataframe cells to calculate running totals based on other cell values

I want to calculate a mass balance through a system in Python. In my current example, there are about 35,000 daily records, but I want to reuse the code for examples with much greater data.

I have a CSV file that captures daily data for:

I want to calculate new columns of data that look to both the previous and following days data. For example, I want to calculate additional columns that represent:

I am new to Python and thought I would utilize Pandas for this data, as I could keep all of the data together. I used np.where for simple conditional calculations using cells on the same row.

bal['Q_IL2TR'] = np.where(bal['Q_PS2IL']<FullTreatementFloMLDs, bal['Q_PS2IL'], FullTreatementFloMLDs)

Is it possible to iterate through a dataframe, and create new column values based on cells in columns from previous/following rows? Also is the best approach to the problem.

Upvotes: 0

Views: 518

Answers (1)

John Zwinck
John Zwinck

Reputation: 249394

Do not start by trying to iterate through a DataFrame. That's very slow. Instead, try to do everything in terms of vectors.

By the way, this:

bal['Q_IL2TR'] = np.where(bal['Q_PS2IL']<FullTreatementFloMLDs, bal['Q_PS2IL'], FullTreatementFloMLDs)

Should be more simply written as:

bal['Q_IL2TR'] = bal['Q_PS2IL'].clip(upper=FullTreatementFloMLDs)

As for things like this:

Irrigation - If rainfall in previous 3 days <10mm AND no rainfall is in following 5 days, irrigate a set volume

You can create rolling sums with window size 3 and 5, store those as new columns in your DataFrame, then use them to create a condition:

last3 = df.Rain.rolling(3, closed=False).sum() # n.b. you might want 4
next5 = df.Rain.rolling(5, closed=False).sum()
df['Irrigate'] = (last3 < 10) & (next5 == 0)

Upvotes: 2

Related Questions