Reputation: 971
I have a following data frame:
id subid a
1 1 1 2
2 1 1 10
3 1 1 20
4 1 2 30
5 1 2 35
6 1 2 36
7 1 2 40
8 2 2 20
9 2 2 29
10 2 2 30
I want to apply say for example pandas diff() function on column "a", but the function should be reapplied whenever either "id" or "subid" is being changed, and want to store the values in a new column.
Below is the df I expect.
id subid a difference
1 1 1 2 NaN
2 1 1 10 8
3 1 1 20 10
4 1 2 30 NaN
5 1 2 35 5
6 1 2 36 1
7 1 2 40 4
8 2 2 20 NaN
9 2 2 29 9
10 2 2 30 1
As it can be observed at Row-4, and Row-8 either "id" or "subid" is changing, so NaN values are present and diff is calculated in successive rows.
Have used
df["difference"] = df["a"].diff()
which is obviously applied to the whole column, and not the way expected. I have tried using groupby, but it's somehow giving extra rows.
Thanks for any suggestions in advance.
Upvotes: 2
Views: 2082
Reputation: 19947
Setup
df = pd.DataFrame({'a': {1: 2, 2: 10, 3: 20, 4: 30, 5: 35, 6: 36, 7: 40, 8: 20, 9: 29, 10: 30},
'id': {1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 2, 9: 2, 10: 2},
'subid': {1: 1, 2: 1, 3: 1, 4: 2, 5: 2, 6: 2, 7: 2, 8: 1, 9: 1, 10: 1}})
Solution
#Check for each row if the id-subid pair has changed with previous row and then calculate diff accordingly
df['difference'] = df.apply(lambda x: x.a - df.ix[x.name-1].a
if (x.name>1 and x[['id','subid']].equals(df.ix[x.name-1][['id','subid']]))
else np.nan, axis=1)
df
Out[368]:
a id subid difference
1 2 1 1 NaN
2 10 1 1 8.0
3 20 1 1 10.0
4 30 1 2 NaN
5 35 1 2 5.0
6 36 1 2 1.0
7 40 1 2 4.0
8 20 2 1 NaN
9 29 2 1 9.0
10 30 2 1 1.0
Upvotes: 2
Reputation: 294258
This is a tricky one. According to your exact wording, you want to reset at every point in which either 'id'
or 'subid'
change. That means even if they change back and forth.
Also, the diff
calculation doesn't make a difference if done within a groupby
context, so I'll calculate it up front and mask when things change.
i = df.id.values
s = df.subid.values
i_chg = np.append(False, i[:-1] != i[1:])
s_chg = np.append(False, s[:-1] != s[1:])
df.assign(difference=df.a.diff().mask(i_chg | s_chg))
id subid a difference
1 1 1 2 NaN
2 1 1 10 8.0
3 1 1 20 10.0
4 1 2 30 NaN
5 1 2 35 5.0
6 1 2 36 1.0
7 1 2 40 4.0
8 2 1 20 NaN
9 2 1 29 9.0
10 2 1 30 1.0
Upvotes: 2
Reputation: 210842
try this:
In [97]: df['difference'] = df.groupby(['id','subid'])['a'].diff()
In [98]: df
Out[98]:
id subid a difference
1 1 1 2 NaN
2 1 1 10 8.0
3 1 1 20 10.0
4 1 2 30 NaN
5 1 2 35 5.0
6 1 2 36 1.0
7 1 2 40 4.0
8 2 1 20 NaN
9 2 1 29 9.0
10 2 1 30 1.0
Upvotes: 2