Reputation: 14094
I need to update vid
or maybe create a new column based on the change
column
df = [{'vid': 14, 'change': 0}, {'vid': 15, 'change': 1}, {'vid': 16, 'change': 0}, {'vid': 16, 'change': 0}, {'vid': 17, 'change': 0}, {'vid': 17, 'change': 1}, {'vid': 18, 'change': 0}, {'vid': 18, 'change': 0}]
vid change
0 14 0
1 15 1
2 16 0
3 16 0
4 17 0
5 17 1
6 18 0
7 18 0
If change == 1
then the next set of vid should be changed to the current and if change == 0
then the next vid should stay the same.
From my example above vid 16 needs to be changed to 15 since 15 has change = 1 however 17 stays since 16 doesn't have change = 1
Change = 1 will only occur when vid is going to change in the next row
Expected output
vid change
0 14 0
1 15 1
2 15 0
3 15 0
4 17 0
5 17 1
6 17 0
7 17 0
Upvotes: 1
Views: 110
Reputation: 3678
Well, why not give ya another option. This shows each transformation step in a single dataframe:
import pandas as pd
df = pd.read_clipboard()
def subtract_one(group):
return group['vid'] - 1
df['shifted_change'] = df['change'].shift(1)
df['desired_result'] = df.groupby('vid').apply(lambda x: subtract_one(x) if x['shifted_change'].any() == 1 else x['vid']).rename('temp').reset_index()['temp']
... here's the result
EDIT
I've profiled each answer to see which one completes fastest. Results:
@Randy is the winner!
.. and I'm dead last, haha :) Nice work everybody!
Here's the profiling code:
import pandas as pd
import timeit
df = pd.read_clipboard()
def zelusp(df):
def subtract_one(group):
return group['vid'] - 1
df['shifted_change'] = df['change'].shift(1)
df['desired_result'] = df.groupby('vid').apply(lambda x: subtract_one(x) if x['shifted_change'].any() == 1 else x['vid']).rename('temp').reset_index()['temp']
return None
def WeNYoBen(df):
s=df.groupby('vid').change.max().eq(1)
df.vid=df.vid.replace(dict(zip(s[s.shift().fillna(False)].index,s.index[s])))
return None
def Quang_Hoang (df):
mask = (df['vid'].eq(df['vid'].shift()) # blocks
| df.change.eq(1).shift() # changes
)
df['vid'] = df['vid'].mask(mask).ffill().astype('int64')
return None
def Erfan (df):
mask = df['change'].shift().eq(1)
df.loc[mask, 'new_vid'] = df['vid'].shift()
df['new_vid'] = df.groupby('vid')['new_vid'].ffill().fillna(df['vid'])
return None
def Randy (df):
df['vid'] = df['vid'] - df['vid'].map(df.groupby("vid")['change'].max().shift(fill_value=0))
return None
%timeit for x in range(100): Randy(df.copy())
# 120 ms ± 283 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit for x in range(100): Quang_Hoang(df.copy())
# 177 ms ± 709 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit for x in range(100): WeNYoBen(df.copy())
# 183 ms ± 1.65 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit for x in range(100): Erfan(df.copy())
# 311 ms ± 1.04 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit for x in range(100): zelusp(df.copy())
# 578 ms ± 4.62 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Upvotes: 1
Reputation: 14847
If you don't mind renumbering all the replaced numbers to be sequential, you can get a pretty clean version of it by subtracting the shift
ed cumsum
of change:
In [59]: df['new_vid'] = df['vid'] - df['change'].shift(fill_value=0).cumsum()
In [60]: df
Out[60]:
vid change new_vid
0 14 0 14
1 15 1 15
2 16 0 15
3 16 0 15
4 17 0 16
5 17 1 16
6 18 0 16
7 18 0 16
Another option with the original numbering is to use map
and groupby
:
In [81]: df['vid'] = df['vid'] - df['vid'].map(df.groupby("vid")['change'].max().shift(fill_value=0))
In [82]: df
Out[82]:
vid change
0 14 0
1 15 1
2 15 0
3 15 0
4 17 0
5 17 1
6 17 0
7 17 0
Upvotes: 1
Reputation: 42916
changes
happen with shift
. shift
the values of vid
one row down. GroupBy.ffill
to get our desired columns:mask = df['change'].shift().eq(1)
df.loc[mask, 'new_vid'] = df['vid'].shift()
df['new_vid'] = df.groupby('vid')['new_vid'].ffill().fillna(df['vid'])
vid change new_vid
0 14 0 14.0
1 15 1 15.0
2 16 0 15.0
3 16 0 15.0
4 17 0 17.0
5 17 1 17.0
6 18 0 17.0
7 18 0 17.0
Step by step
Check where changes happen:
df['change'].shift().eq(1)
0 False
1 False
2 True
3 False
4 False
5 False
6 True
7 False
Name: change, dtype: bool
Create new column where we shift the values of vid
:
df.loc[mask, 'new_vid'] = df['vid'].shift()
vid change new_vid
0 14 0 NaN
1 15 1 NaN
2 16 0 15.0
3 16 0 NaN
4 17 0 NaN
5 17 1 NaN
6 18 0 17.0
7 18 0 NaN
Finally Groupby
and forwardfill
:
df['new_vid'] = df.groupby('vid')['new_vid'].ffill().fillna(df['vid'])
vid change new_vid
0 14 0 14.0
1 15 1 15.0
2 16 0 15.0
3 16 0 15.0
4 17 0 17.0
5 17 1 17.0
6 18 0 17.0
7 18 0 17.0
Upvotes: 2
Reputation: 150735
IIUC, you can mask the changes together with the blocks, and ffill
:
mask = (df['vid'].eq(df['vid'].shift()) # blocks
| df.change.eq(1).shift() # changes
)
df['vid'] = df['vid'].mask(mask).ffill().astype('int64')
Output:
vid change
0 14 0
1 15 1
2 15 0
3 15 0
4 17 0
5 17 1
6 17 0
7 17 0
Upvotes: 2
Reputation: 323226
Assume the dataframe is well construct like the sample we could do shift
, then create the map dict
and replace
it
s=df.groupby('vid').change.max().eq(1)
df.vid=df.vid.replace(dict(zip(s[s.shift().fillna(False)].index,s.index[s])))
df
Out[65]:
vid change
0 14 0
1 15 1
2 15 0
3 15 0
4 17 0
5 17 1
6 17 0
7 17 0
Upvotes: 2