Reputation: 1306
I have a pandas dataframe, and I need to create a column based on an existing column (not hard), but I need the i
th value to be based on the i-1
th value of the column. Example series:
data = np.array([0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1])
I want the i
th element to be 1 if it is the start of a series of 1
s, e.g.:
array([0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0])
There are other operations I'd like to be able to do, but just understanding how to do this without iterating through would be incredibly helpful. I apologize if this has been asked, I wasn't sure how to search for it.
Upvotes: 2
Views: 65
Reputation: 294218
np.where
# [0 0 0 1 1 1 0 1 0 0 0 1 1 1] <- data
# [0 0 0 0 1 1 1 0 1 0 0 0 1 1] <- np.append(0, data[:-1])
# ^ \__shifted data d[:-1]__/
# |
# appended zero
# [1 1 1 1 0 0 0 1 0 1 1 1 0 0] <- ~np.append(0, data[:-1])
# [0 0 0 1 0 0 0 1 0 0 0 1 0 0] <- result
np.where(data & ~np.append(0, data[:-1]).astype(bool), 1, 0)
array([0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0])
Using array multiplication
data * (1 - np.append(0, data[:-1]))
array([0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0])
np.diff
(np.diff(np.append(0, data)) == 1).astype(int)
array([0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0])
Upvotes: 3
Reputation: 353009
If 1 is the start of a group, that means that it's 1 and the previous element isn't 1. This is a little easier to do in pandas than in pure numpy, because "the previous element isn't 1" can be translated using a shift
, which moves all the data (by default, 1 forward).
In [15]: s = pd.Series([0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1])
In [16]: ((s == 1) & (s.shift() != 1)).astype(int)
Out[16]:
0 0
1 0
2 0
3 1
4 0
5 0
6 0
7 1
8 0
9 0
10 0
11 1
12 0
13 0
dtype: int64
Even the case where 1 is the first element will work, because since there's no element before 1 we get a NaN after shifting, and NaN != 1:
n [18]: s.shift().head()
Out[18]:
0 NaN
1 0.0
2 0.0
3 0.0
4 1.0
Upvotes: 2