Create a column in a pandas DataFrame using the previously computed value

Question

I have for example the following input DataFrame:

> df = pandas.DataFrame({'x': [1, 6, 8, 5, 2, 6, 12]})
> df
    x
0   1
1   6
2   8
3   5
4   2
5   6
6  12

And I would like to create the column y such that:

y[i] = 0 if x < 4,

y[i] = 1 if x > 6

and y[i] = y[i - 1] if 4 <= x <= 6

So that with the example above the output would be:

What is the best way to do this? A simple apply() does not seem to work as I did not find a way to reference a previously computed value in the column that is being created by the apply().

behzad.nouri · Accepted Answer

You may use np.select followed by .fillna:

>>> df['y'] = np.select([df['x'] < 4, 6 < df['x']], [0, 1], np.nan)
>>> df['y'] = df['y'].fillna(method='ffill').astype('int')
>>> df
    x  y
0   1  0
1   6  0
2   8  1
3   5  1
4   2  0
5   6  0
6  12  1

Create a column in a pandas DataFrame using the previously computed value

Answers (1)

Related Questions