Pandas Operations on Columns based on other entries

Question

I have a pandas dataframe, and I need to create a column based on an existing column (not hard), but I need the ith value to be based on the i-1th value of the column. Example series:

data = np.array([0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1])

I want the ith element to be 1 if it is the start of a series of 1s, e.g.:

array([0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0])

There are other operations I'd like to be able to do, but just understanding how to do this without iterating through would be incredibly helpful. I apologize if this has been asked, I wasn't sure how to search for it.

DSM · Accepted Answer

If 1 is the start of a group, that means that it's 1 and the previous element isn't 1. This is a little easier to do in pandas than in pure numpy, because "the previous element isn't 1" can be translated using a shift, which moves all the data (by default, 1 forward).

In [15]: s = pd.Series([0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1])

In [16]: ((s == 1) & (s.shift() != 1)).astype(int)
Out[16]: 
0     0
1     0
2     0
3     1
4     0
5     0
6     0
7     1
8     0
9     0
10    0
11    1
12    0
13    0
dtype: int64

Even the case where 1 is the first element will work, because since there's no element before 1 we get a NaN after shifting, and NaN != 1:

n [18]: s.shift().head()
Out[18]: 
0    NaN
1    0.0
2    0.0
3    0.0
4    1.0

Pandas Operations on Columns based on other entries

Answers (2)

Related Questions