Dude_Scott
Dude_Scott

Reputation: 641

Add Dataframe Rows Based on Conditions of Other Rows

Is there a way to add rows to a dataframe based on other rows in the dataframe?

I have a dataframe that looks something like this:

Year   Month   Project   Account   Value
2020   Jan     abc       Revenue   100
2020   Jan     xyz       Profit    20
2020   Jan     abc       Profit    10
2020   Jan     xyz       Revenue   350

I want to perform a calculation for all lines that have the Revenue account and the formula should be Cost = Revenue - Profit for all columns (Year, Month, Project) that are the same (all rows are unique, so there would only be one possible match for each calculation).

So the result would look like

Year   Month   Project   Account   Value
2020   Jan     abc       Revenue   100
2020   Jan     xyz       Profit    20
2020   Jan     abc       Profit    10
2020   Jan     xyz       Revenue   350
2020   Jan     abc       Cost      90
2020   Jan     xyz       Cost      340

I'm pretty new to pandas but in all the searching I've done, I haven't found a pythonic way to do this with pandas. The data set is very large (100K+ rows) so I would rather not have to calculate iteratively, but if I have to I thought about using another data structure to store the dataframe (like a dictionary) and perform the calculations.

Upvotes: 3

Views: 83

Answers (2)

Quang Hoang
Quang Hoang

Reputation: 150735

Use pivot and stack:

(df.pivot_table(index=['Year','Month', 'Project'],
               columns=['Account'], values='Value'
               )
   .assign(Cost=lambda x: x['Revenue']-x['Profit'])
   .stack().reset_index(name='Value')
)

Output:

   Year Month Project  Account  Value
0  2020   Jan     abc   Profit     10
1  2020   Jan     abc  Revenue    100
2  2020   Jan     abc     Cost     90
3  2020   Jan     xyz   Profit     20
4  2020   Jan     xyz  Revenue    350
5  2020   Jan     xyz     Cost    330

Upvotes: 2

Rajith Thennakoon
Rajith Thennakoon

Reputation: 4130

Try this

grp = df.groupby(["Year",   "Month",   "Project",   "Account"]).agg({"Value":"sum"}).reset_index()
costdf = grp.groupby(["Year",   "Month",   "Project"]).apply(lambda x: (x[x["Account"]=="Revenue"]["Value"].values[0] - x[x["Account"]=="Profit"]["Value"].values[0])).reset_index()
costdf.columns = ["Year",   "Month",   "Project","Value"]
costdf["Account"] = "Cost"
costdf = costdf[df.columns]
df = pd.concat([df,costdf])

Upvotes: 0

Related Questions