Nahuel Patiño
Nahuel Patiño

Reputation: 89

Conditional way for a column to take the previous value of itself without resorting to a for loop

Basically I want to create a new column which replicates the last different value of another column.

I tried this but doesn't work.

import pandas as pd
import numpy as np

column1=  [1,2,2,2,3,3,3,3,3,3,2,2,2,2 ]
data = pd.DataFrame(column1, columns =['column1'])

data['column2'] =  np.where(data['column1'] != data['column1'].shift(1),  data['column1'].shift(1),data['column2'].shift(1))

Also tried this:

data['column2'] = [data['column2'].shift(1) if x != data['column1'].shift(1) else    data['column1'].shift(1) for x in data['column1']]

Desired output is as follows:

column1,column2
1,
2,1
2,1
2,1
3,2
3,2
3,2
3,2
3,2
3,2
2,3
2,3
2,3
2,3

Oh and while I'm sure I can do this with a for loop, I'm interested in finding a way to get this done without a loop.

Thanks

Edit: This method gets closer, but I ned to run these two lines of code once at a time, so it's extremely impractical.

data.loc[data['column1'] != data['column1'].shift(1), 'column2'] = data['column1'].shift(1)
data.loc[data['column1'] == data['column1'].shift(1), 'column2'] = data['column2'].shift(1)

Upvotes: 1

Views: 69

Answers (3)

masat
masat

Reputation: 350

This works for your specific example

data['column2'] = data.diff(1).apply(lambda r: data.loc[r.name - 1, 'column1'] if abs(r.column1) == 1 else None, axis=1).ffill()

How fast it is?

%%timeit
data.diff(1).apply(lambda r: data.loc[r.name - 1, 'column1'] if abs(r.column1) == 1 else None, axis=1).ffill()
1.41 ms ± 14.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%%timeit
s = data.column1.diff().ne(0).cumsum()
data.shift().groupby(s).column1.transform('first')
2.36 ms ± 116 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Upvotes: 1

Andy L.
Andy L.

Reputation: 25259

try shift and groupby on custom groupID s and transform first

s = data.column1.diff().ne(0).cumsum()
data['column2'] = data.shift().groupby(s).column1.transform('first')

Out[374]:
    column1  column2
0         1      NaN
1         2      1.0
2         2      1.0
3         2      1.0
4         3      2.0
5         3      2.0
6         3      2.0
7         3      2.0
8         3      2.0
9         3      2.0
10        2      3.0
11        2      3.0
12        2      3.0
13        2      3.0

Upvotes: 3

Nahuel Patiño
Nahuel Patiño

Reputation: 89

Extremely slow 'solution' I just found

data['column2'] =np.NaN

    nacount=1
    nacount2=0
    while nacount != nacount2:
        nacount = data['column2'].isna().sum()
        data.loc[data['column1'] != data['column1'].shift(1), 'column2'] = data['column1'].shift(1)
        data.loc[data['column1'] == data['column1'].shift(1), 'column2'] = data['column2'].shift(1)
        nacount2 = data['column2'].isna().sum()

I'm not happy with it, excel does this formula faster.

Upvotes: 0

Related Questions