Reputation: 641
Is there a way to add rows to a dataframe based on other rows in the dataframe?
I have a dataframe that looks something like this:
Year Month Project Account Value
2020 Jan abc Revenue 100
2020 Jan xyz Profit 20
2020 Jan abc Profit 10
2020 Jan xyz Revenue 350
I want to perform a calculation for all lines that have the Revenue account and the formula should be Cost = Revenue - Profit for all columns (Year, Month, Project) that are the same (all rows are unique, so there would only be one possible match for each calculation).
So the result would look like
Year Month Project Account Value
2020 Jan abc Revenue 100
2020 Jan xyz Profit 20
2020 Jan abc Profit 10
2020 Jan xyz Revenue 350
2020 Jan abc Cost 90
2020 Jan xyz Cost 340
I'm pretty new to pandas but in all the searching I've done, I haven't found a pythonic way to do this with pandas. The data set is very large (100K+ rows) so I would rather not have to calculate iteratively, but if I have to I thought about using another data structure to store the dataframe (like a dictionary) and perform the calculations.
Upvotes: 3
Views: 83
Reputation: 150735
Use pivot
and stack
:
(df.pivot_table(index=['Year','Month', 'Project'],
columns=['Account'], values='Value'
)
.assign(Cost=lambda x: x['Revenue']-x['Profit'])
.stack().reset_index(name='Value')
)
Output:
Year Month Project Account Value
0 2020 Jan abc Profit 10
1 2020 Jan abc Revenue 100
2 2020 Jan abc Cost 90
3 2020 Jan xyz Profit 20
4 2020 Jan xyz Revenue 350
5 2020 Jan xyz Cost 330
Upvotes: 2
Reputation: 4130
Try this
grp = df.groupby(["Year", "Month", "Project", "Account"]).agg({"Value":"sum"}).reset_index()
costdf = grp.groupby(["Year", "Month", "Project"]).apply(lambda x: (x[x["Account"]=="Revenue"]["Value"].values[0] - x[x["Account"]=="Profit"]["Value"].values[0])).reset_index()
costdf.columns = ["Year", "Month", "Project","Value"]
costdf["Account"] = "Cost"
costdf = costdf[df.columns]
df = pd.concat([df,costdf])
Upvotes: 0