riders994
riders994

Reputation: 1306

Pandas Operations on Columns based on other entries

I have a pandas dataframe, and I need to create a column based on an existing column (not hard), but I need the ith value to be based on the i-1th value of the column. Example series:

data = np.array([0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1])

I want the ith element to be 1 if it is the start of a series of 1s, e.g.:

array([0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0])

There are other operations I'd like to be able to do, but just understanding how to do this without iterating through would be incredibly helpful. I apologize if this has been asked, I wasn't sure how to search for it.

Upvotes: 2

Views: 65

Answers (2)

piRSquared
piRSquared

Reputation: 294218

np.where

# [0 0 0 1 1 1 0 1 0 0 0 1 1 1] <- data
# [0 0 0 0 1 1 1 0 1 0 0 0 1 1] <- np.append(0, data[:-1])
#  ^ \__shifted data d[:-1]__/
#  |
# appended zero
# [1 1 1 1 0 0 0 1 0 1 1 1 0 0] <- ~np.append(0, data[:-1])
# [0 0 0 1 0 0 0 1 0 0 0 1 0 0] <- result

np.where(data & ~np.append(0, data[:-1]).astype(bool), 1, 0)

array([0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0])

Using array multiplication

data * (1 - np.append(0, data[:-1]))

array([0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0])

np.diff

(np.diff(np.append(0, data)) == 1).astype(int)

array([0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0])

Upvotes: 3

DSM
DSM

Reputation: 353009

If 1 is the start of a group, that means that it's 1 and the previous element isn't 1. This is a little easier to do in pandas than in pure numpy, because "the previous element isn't 1" can be translated using a shift, which moves all the data (by default, 1 forward).

In [15]: s = pd.Series([0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1])

In [16]: ((s == 1) & (s.shift() != 1)).astype(int)
Out[16]: 
0     0
1     0
2     0
3     1
4     0
5     0
6     0
7     1
8     0
9     0
10    0
11    1
12    0
13    0
dtype: int64

Even the case where 1 is the first element will work, because since there's no element before 1 we get a NaN after shifting, and NaN != 1:

n [18]: s.shift().head()
Out[18]: 
0    NaN
1    0.0
2    0.0
3    0.0
4    1.0

Upvotes: 2

Related Questions