Kenan
Kenan

Reputation: 14094

Update column based on other column condition

I need to update vid or maybe create a new column based on the change column

df = [{'vid': 14, 'change': 0}, {'vid': 15, 'change': 1}, {'vid': 16, 'change': 0}, {'vid': 16, 'change': 0}, {'vid': 17, 'change': 0}, {'vid': 17, 'change': 1}, {'vid': 18, 'change': 0}, {'vid': 18, 'change': 0}]

    vid  change
0    14       0
1    15       1
2    16       0
3    16       0
4    17       0
5    17       1
6    18       0
7    18       0

If change == 1 then the next set of vid should be changed to the current and if change == 0 then the next vid should stay the same.

From my example above vid 16 needs to be changed to 15 since 15 has change = 1 however 17 stays since 16 doesn't have change = 1

Change = 1 will only occur when vid is going to change in the next row

Expected output

    vid  change
0    14       0
1    15       1
2    15       0
3    15       0
4    17       0
5    17       1
6    17       0
7    17       0

Upvotes: 1

Views: 110

Answers (5)

zelusp
zelusp

Reputation: 3678

Well, why not give ya another option. This shows each transformation step in a single dataframe:

import pandas as pd

df = pd.read_clipboard()

def subtract_one(group):
    return group['vid'] - 1

df['shifted_change'] = df['change'].shift(1)

df['desired_result'] = df.groupby('vid').apply(lambda x: subtract_one(x) if x['shifted_change'].any() == 1 else x['vid']).rename('temp').reset_index()['temp']

... here's the result

enter image description here

EDIT

I've profiled each answer to see which one completes fastest. Results:

@Randy is the winner!

enter image description here

.. and I'm dead last, haha :) Nice work everybody!

Here's the profiling code:

import pandas as pd
import timeit

df = pd.read_clipboard()

def zelusp(df):
    def subtract_one(group):
        return group['vid'] - 1
    df['shifted_change'] = df['change'].shift(1)
    df['desired_result'] = df.groupby('vid').apply(lambda x: subtract_one(x) if x['shifted_change'].any() == 1 else x['vid']).rename('temp').reset_index()['temp']    
    return None

def WeNYoBen(df):
    s=df.groupby('vid').change.max().eq(1)
    df.vid=df.vid.replace(dict(zip(s[s.shift().fillna(False)].index,s.index[s])))
    return None

def Quang_Hoang (df):
    mask = (df['vid'].eq(df['vid'].shift())  # blocks
         | df.change.eq(1).shift()       # changes 
       )
    df['vid'] = df['vid'].mask(mask).ffill().astype('int64')
    return None

def Erfan (df):
    mask = df['change'].shift().eq(1)
    df.loc[mask, 'new_vid'] = df['vid'].shift()
    df['new_vid'] = df.groupby('vid')['new_vid'].ffill().fillna(df['vid'])
    return None

def Randy (df):
    df['vid'] = df['vid'] - df['vid'].map(df.groupby("vid")['change'].max().shift(fill_value=0))
    return None

%timeit for x in range(100): Randy(df.copy())
# 120 ms ± 283 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit for x in range(100): Quang_Hoang(df.copy())
# 177 ms ± 709 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit for x in range(100): WeNYoBen(df.copy())
# 183 ms ± 1.65 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit for x in range(100): Erfan(df.copy())
# 311 ms ± 1.04 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit for x in range(100): zelusp(df.copy())
# 578 ms ± 4.62 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Upvotes: 1

Randy
Randy

Reputation: 14847

If you don't mind renumbering all the replaced numbers to be sequential, you can get a pretty clean version of it by subtracting the shifted cumsum of change:

In [59]: df['new_vid'] = df['vid'] - df['change'].shift(fill_value=0).cumsum()

In [60]: df
Out[60]:
   vid  change  new_vid
0   14       0       14
1   15       1       15
2   16       0       15
3   16       0       15
4   17       0       16
5   17       1       16
6   18       0       16
7   18       0       16

Another option with the original numbering is to use map and groupby:

In [81]: df['vid'] = df['vid'] - df['vid'].map(df.groupby("vid")['change'].max().shift(fill_value=0))

In [82]: df
Out[82]:
   vid  change
0   14       0
1   15       1
2   15       0
3   15       0
4   17       0
5   17       1
6   17       0
7   17       0

Upvotes: 1

Erfan
Erfan

Reputation: 42916

  1. First we check where the changes happen with shift.
  2. Then we create a new column where we shift the values of vid one row down.
  3. Then we use GroupBy.ffill to get our desired columns:
mask = df['change'].shift().eq(1)
df.loc[mask, 'new_vid'] = df['vid'].shift()

df['new_vid'] = df.groupby('vid')['new_vid'].ffill().fillna(df['vid'])
   vid  change  new_vid
0   14       0     14.0
1   15       1     15.0
2   16       0     15.0
3   16       0     15.0
4   17       0     17.0
5   17       1     17.0
6   18       0     17.0
7   18       0     17.0

Step by step

Check where changes happen:

df['change'].shift().eq(1)

0    False
1    False
2     True
3    False
4    False
5    False
6     True
7    False
Name: change, dtype: bool

Create new column where we shift the values of vid:

df.loc[mask, 'new_vid'] = df['vid'].shift()

   vid  change  new_vid
0   14       0      NaN
1   15       1      NaN
2   16       0     15.0
3   16       0      NaN
4   17       0      NaN
5   17       1      NaN
6   18       0     17.0
7   18       0      NaN

Finally Groupby and forwardfill:

df['new_vid'] = df.groupby('vid')['new_vid'].ffill().fillna(df['vid'])

   vid  change  new_vid
0   14       0     14.0
1   15       1     15.0
2   16       0     15.0
3   16       0     15.0
4   17       0     17.0
5   17       1     17.0
6   18       0     17.0
7   18       0     17.0

Upvotes: 2

Quang Hoang
Quang Hoang

Reputation: 150735

IIUC, you can mask the changes together with the blocks, and ffill:

mask = (df['vid'].eq(df['vid'].shift())  # blocks
         | df.change.eq(1).shift()       # changes 
       )

df['vid'] = df['vid'].mask(mask).ffill().astype('int64')

Output:

   vid  change
0   14       0
1   15       1
2   15       0
3   15       0
4   17       0
5   17       1
6   17       0
7   17       0

Upvotes: 2

BENY
BENY

Reputation: 323226

Assume the dataframe is well construct like the sample we could do shift , then create the map dict and replace it

s=df.groupby('vid').change.max().eq(1)
df.vid=df.vid.replace(dict(zip(s[s.shift().fillna(False)].index,s.index[s])))
df
Out[65]: 
   vid  change
0   14       0
1   15       1
2   15       0
3   15       0
4   17       0
5   17       1
6   17       0
7   17       0

Upvotes: 2

Related Questions