Reputation: 4886
In a dataframe where one column is datetime and another one is only ones or zeros, how can I find the times of each of the last occurences of 1? For example:
times = pd.date_range(start="1/1/2015", end="2/1/2015",freq='D')
YN = np.zeros(len(times))
YN[0:8] = np.ones(len(YN[0:8]))
YN[12:20] = np.ones(len(YN[12:20]))
YN[25:29] = np.ones(len(YN[25:29]))
df = pd.DataFrame({"Time":times,"Yes No":YN})
print df
Which looks like
Time Yes No
0 2015-01-01 1.0
1 2015-01-02 1.0
2 2015-01-03 1.0
3 2015-01-04 1.0
4 2015-01-05 1.0
5 2015-01-06 1.0
6 2015-01-07 1.0
7 2015-01-08 1.0
8 2015-01-09 0.0
9 2015-01-10 0.0
10 2015-01-11 0.0
11 2015-01-12 0.0
12 2015-01-13 1.0
13 2015-01-14 1.0
14 2015-01-15 1.0
15 2015-01-16 1.0
16 2015-01-17 1.0
17 2015-01-18 1.0
18 2015-01-19 1.0
19 2015-01-20 1.0
20 2015-01-21 0.0
21 2015-01-22 0.0
22 2015-01-23 0.0
23 2015-01-24 0.0
24 2015-01-25 0.0
25 2015-01-26 1.0
26 2015-01-27 1.0
27 2015-01-28 1.0
28 2015-01-29 1.0
29 2015-01-30 0.0
30 2015-01-31 0.0
31 2015-02-01 0.0
How could I extract the dates that have the last occurrence of 1 before another series of zeros, in this case 8/1/2015, 20/1/2015 and 29/1/2015? This question addresses a similar problem, but I don't want all of the ones, I just want the last one before it changes to zero (and not only the one where it happens for the first time).
Upvotes: 2
Views: 104
Reputation: 294278
numpy
v = df['Yes No'].values
df[(v - np.append(v[1:], 0) == 1)]
Time Yes No
7 2015-01-08 1.0
19 2015-01-20 1.0
28 2015-01-29 1.0
v = df['Yes No'].values
df.Time[(v - np.append(v[1:], 0) == 1)]
7 2015-01-08
19 2015-01-20
28 2015-01-29
Name: Time, dtype: datetime64[ns]
Upvotes: 2
Reputation: 7476
Here's an approach using pandas groupby. It could be useful if you plan to do many operations on this kind of data.
def find_consecutive(x, on = None, filter = None):
# Group consecutive sequences
if on is None:
on = x.columns
return x.groupby([(x[on] != x[on].shift()).cumsum(), x[on].loc[:]])
grouped = df.pipe(lambda x: find_consecutive(x, on = 'Yes No'))
# For each sequence extract the last time
last_dates = grouped.last()\ # Explicitly: apply(lambda x: x['Time'].iloc[-1])\
.reset_index(level = 1, drop = False)
# A bit of formatting to extract only dates for "Yes" (there is probably
# a cleaner way to do this)
yes_last_dates = last_dates.pipe(lambda x: x[x["Yes No"]==1]['Time'])\
.pipe(lambda x: x.reset_index(drop = True))
This gives the expected result:
0 2015-01-08
1 2015-01-20
2 2015-01-29
You can inspect grouped
doing the following:
for key, group in grouped:
print key, group
(1, 1.0) Time Yes No
0 2015-01-01 1.0
1 2015-01-02 1.0
2 2015-01-03 1.0
3 2015-01-04 1.0
4 2015-01-05 1.0
5 2015-01-06 1.0
6 2015-01-07 1.0
7 2015-01-08 1.0
(2, 0.0) Time Yes No
8 2015-01-09 0.0
9 2015-01-10 0.0
10 2015-01-11 0.0
11 2015-01-12 0.0
(3, 1.0) Time Yes No
12 2015-01-13 1.0
13 2015-01-14 1.0
14 2015-01-15 1.0
15 2015-01-16 1.0
16 2015-01-17 1.0
17 2015-01-18 1.0
18 2015-01-19 1.0
19 2015-01-20 1.0
....
Upvotes: 0
Reputation: 862681
You can use diff
with eq
for boolean mask and filter by boolean indexing
:
print (df[df['Yes_No'].diff(-1).eq(1)])
Time Yes_No
7 2015-01-08 1.0
19 2015-01-20 1.0
28 2015-01-29 1.0
print (df.loc[df['Yes_No'].diff(-1).eq(1), 'Time'])
7 2015-01-08
19 2015-01-20
28 2015-01-29
Name: Time, dtype: datetime64[ns]
Upvotes: 4
Reputation: 210842
you can use Series.shift(-1) in conjunction with Series.diff() methods
In [42]: df.loc[df['Yes No'].shift(-1).diff().eq(-1)]
Out[42]:
Time Yes No
7 2015-01-08 1.0
19 2015-01-20 1.0
28 2015-01-29 1.0
In [43]: df.loc[df['Yes No'].shift(-1).diff().eq(-1), 'Time']
Out[43]:
7 2015-01-08
19 2015-01-20
28 2015-01-29
Name: Time, dtype: datetime64[ns]
Explanation:
In [44]: df['Yes No'].shift(-1).diff()
Out[44]:
0 NaN
1 0.0
2 0.0
3 0.0
4 0.0
5 0.0
6 0.0
7 -1.0
8 0.0
9 0.0
10 0.0
11 1.0
12 0.0
13 0.0
14 0.0
15 0.0
16 0.0
17 0.0
18 0.0
19 -1.0
20 0.0
21 0.0
22 0.0
23 0.0
24 1.0
25 0.0
26 0.0
27 0.0
28 -1.0
29 0.0
30 0.0
31 NaN
Name: Yes No, dtype: float64
Upvotes: 4