Dataframe values conditional on multiple values

Question

I have the following df in pandas:

person  year  A      B
AA      1998  5
AA      1999  10
AA      2000  15 
XB      2010  100
CY      1980  3 
CY      1981  9
CY      1982  36
CY      1983  72
MJ      2017  120
MJ      2018  240

I'd like to iterate over each person in the df, in order by year, and compare their column A values by, for example, percentage difference and place that value in column B. For example, 1999 AA - the value is 200% bigger than the 1998 AA column A value. The 2000 AA value would be 150% and so on.

The first year that a person appears the column B value would be zero - for example, AA 1998 - the column B value would be zero as there is no previous year to compare it to.

I am wondering how to initiate an iteration to specifically look at the person, and then the year, and compare that year's column A value to the prior year's value. This would repeat until reaching a new person.

Ben Pap · Accepted Answer

dfshift = df.groupby('person')['A'].transform(lambda x: x.shift())
df['B'] = (df['A']/dfshift)*100
df['B'].fillna(0, inplace = True)


    person  year    A   B
0   AA     1998     5   0.0
1   AA     1999    10   200.0
2   AA     2000    15   150.0
3   XB     2010   100   0.0
4   CY     1980    3    0.0
5   CY     1981    9    300.0
6   CY     1982   36    400.0
7   CY     1983   72    200.0
8   MJ     2017  120    0.0
9   MJ     2018  240    200.0

This should do what you mean. Just do a minor transform on A to get the values you want to compare, then do the operation you want to do. Be warned though that your year values will have to be correctly sorted for this to work.

Dataframe values conditional on multiple values

Answers (1)

Related Questions