Rebekah Scott
Rebekah Scott

Reputation: 35

Pandas Creating a Difference Column based on multiple values

I'm trying to create a difference column in Pandas based on comparing stats from one county to another, but the problem is that I have multiple years of data and I don't want to just subtract each entry by the overall average but by the comparison county's stat for that year.

example on excel

Here's a picture I had in mind. It's easy (though tedious) to do on excel, but you just subtract county 1 for that year from the other county values... I'm just wondering if there's a way to do this on pandas? I've looked into the diff method, but it doesn't seem useful and it's not like I can use the typical code like this (because we're going by year, not some aggregate statistic).

cols I have in my example: "Year" "County" "Stat 1" and I want to create col of "Stat diff b/t county 1"

This is code I've tried that dosen't work:

df["Diff Between County 1"] = df.loc[df.County == 'County 1', ['Year', "Stat 1"] ]- df["Stat 1"]

I can't figure out how to also group by year or something ... because County 1 has a different value every year.

I've also tried to just select the stats and years for the comparison county but I'm just not sure how to then take the difference by year. There must be some way to match up multiple conditions or something

df_wantedcounty = df.loc[df.County =='County 1', ['Stat 1', 'Year'] ]
df_wantedcounty = df_la.reset_index(drop = True)

Any help?

Upvotes: 0

Views: 48

Answers (1)

Michael S.
Michael S.

Reputation: 3128

Here you go, as a one liner!

### Recreate Dataframe ####
Year = [2010, 2010, 2010, 2022, 2022, 2022]
County = ["County 1", "County 2", "County 3", "County 1", "County 2", "County 3", ]
Stat_1 = [8312, 7816, 8185, 10176, 10666, 8145]
df = pd.DataFrame({"Year":Year, "County":County, "Stat_1":Stat_1})
############################


# Code to make new column
df["Stat diff b/t county 1"] = df.groupby('Year')['Stat_1'].apply(lambda x: x.iloc[0]-x)

Output:

    Year    County  Stat_1  Stat diff b/t county 1
0   2010    County 1    8312    0
1   2010    County 2    7816    496
2   2010    County 3    8185    127
3   2022    County 1    10176   0
4   2022    County 2    10666   -490
5   2022    County 3    8145    2031

Upvotes: 2

Related Questions