Subtracting minimum values of a certain pandas dataframe column based on another column

Question

I have a huge pandas DataFrame df, sorted by id and then year:

id        gender        year
3         male          1983
3         male          1983
3         male          1985
3         male          1990
6         female        1991
6         female        1992
7         male          1980
...
592873    female        1989
592873    female        1996
593001    male          2001
593428    female        2007
593428    female        2009

My goal is to create another column ca which is computed by:

year - minimum year of that id

Hence, the first six rows of df should return:

id        gender        year        ca
3         male          1983        0
3         male          1983        0
3         male          1985        2
3         male          1990        7
6         female        1991        0
6         female        1992        1

(In other words, I'm searching for a Pythonic answer to this question.)

One solution I could think of is to make a list and use a for loop:

ca_list = []

for i in range(len(df)):
  if df['id'][i] != df['id'][i-1]:
    num = df['year'][i]
    ca_list.append(0)
  else:
    ca_list.append(df['year'][i] - num)

df['ca'] = ca_list

But I believe there is a more optimal way to devise this. Any insights are much appreciated.

Andrej Kesely · Accepted Answer

Try:

df["ca"] = df.groupby("id")["year"].transform(lambda x: x - x.min())
print(df)

Prints:

   id  gender  year  ca
0   3    male  1983   0
1   3    male  1983   0
2   3    male  1985   2
3   3    male  1990   7
4   6  female  1991   0
5   6  female  1992   1

Subtracting minimum values of a certain pandas dataframe column based on another column

Answers (1)

Related Questions