Reputation: 49
I have this dataframe:
name color number
0 john red 4
1 ana red 4
2 ana red 5
3 paul red 6
4 mark red 3
5 ana yellow 10
6 john yellow 11
7 john yellow 12
8 john red 13
If the value in color column change (according to the name column), I want to create another column with the subtraction between the last value associated to the color and the first value from the new color. If the value in color column don't change, return -999. Ex: Looking to ana, the last value for red is 5 and the first value for yellow is 10. So, the new column will be 10 - 5 = 5 to ana. Looking to john, the last value for red is 4 and the first value for yellow is 11. So, the new column will be 11 - 4 = 7 to john. Do that just one time. If the color change again, it doesn't mather.
I want this output:
name color number difference
0 john red 4 7
1 ana red 4 5
2 ana red 5 5
3 paul red 6 -999
4 mark red 3 -999
5 ana yellow 10 5
6 john yellow 11 7
7 john yellow 12 7
8 john red 13 7
please, somebody help me?
Upvotes: 1
Views: 114
Reputation: 22031
try in this way
df = pd.DataFrame({'name':['john','ana','ana','paul','mark','ana','john','john','john'],
'color':['red','red','red','red','red','yellow','yellow','yellow','red'],
'number':[4,4,5,6,3,10,11,12,13]})
df['color_code'] = df['color'].factorize()[0]
partial_df = pd.DataFrame()
partial_df['difference'] = df.groupby('name')['number'].apply(lambda x: list(np.diff(x))).explode()
partial_df['change_status'] = df.groupby('name')['color_code'].apply(lambda x: list((np.diff(x)>0)+0)).explode()
map_difference = partial_df.loc[partial_df.change_status != 0].reset_index().drop_duplicates('name').set_index('name')['difference']
df['difference'] = df.name.copy().map(map_difference).fillna(-999)
df
Upvotes: 2