durbachit
durbachit

Reputation: 4886

How to find a position of a last ocurrence of certain value in a pandas dataframe?

In a dataframe where one column is datetime and another one is only ones or zeros, how can I find the times of each of the last occurences of 1? For example:

times = pd.date_range(start="1/1/2015", end="2/1/2015",freq='D')
YN = np.zeros(len(times))
YN[0:8] = np.ones(len(YN[0:8]))
YN[12:20] = np.ones(len(YN[12:20]))
YN[25:29] = np.ones(len(YN[25:29]))
df = pd.DataFrame({"Time":times,"Yes No":YN})
print df

Which looks like

         Time  Yes No
0  2015-01-01     1.0
1  2015-01-02     1.0
2  2015-01-03     1.0
3  2015-01-04     1.0
4  2015-01-05     1.0
5  2015-01-06     1.0
6  2015-01-07     1.0
7  2015-01-08     1.0
8  2015-01-09     0.0
9  2015-01-10     0.0
10 2015-01-11     0.0
11 2015-01-12     0.0
12 2015-01-13     1.0
13 2015-01-14     1.0
14 2015-01-15     1.0
15 2015-01-16     1.0
16 2015-01-17     1.0
17 2015-01-18     1.0
18 2015-01-19     1.0
19 2015-01-20     1.0
20 2015-01-21     0.0
21 2015-01-22     0.0
22 2015-01-23     0.0
23 2015-01-24     0.0
24 2015-01-25     0.0
25 2015-01-26     1.0
26 2015-01-27     1.0
27 2015-01-28     1.0
28 2015-01-29     1.0
29 2015-01-30     0.0
30 2015-01-31     0.0
31 2015-02-01     0.0

How could I extract the dates that have the last occurrence of 1 before another series of zeros, in this case 8/1/2015, 20/1/2015 and 29/1/2015? This question addresses a similar problem, but I don't want all of the ones, I just want the last one before it changes to zero (and not only the one where it happens for the first time).

Upvotes: 2

Views: 104

Answers (4)

piRSquared
piRSquared

Reputation: 294278

numpy

v = df['Yes No'].values
df[(v - np.append(v[1:], 0) == 1)]

         Time  Yes No
7  2015-01-08     1.0
19 2015-01-20     1.0
28 2015-01-29     1.0

v = df['Yes No'].values
df.Time[(v - np.append(v[1:], 0) == 1)]

7    2015-01-08
19   2015-01-20
28   2015-01-29
Name: Time, dtype: datetime64[ns]

Upvotes: 2

FLab
FLab

Reputation: 7476

Here's an approach using pandas groupby. It could be useful if you plan to do many operations on this kind of data.

def find_consecutive(x, on = None, filter = None):
    # Group consecutive sequences 
    if on is None:
        on = x.columns
    return x.groupby([(x[on] != x[on].shift()).cumsum(), x[on].loc[:]])

grouped = df.pipe(lambda x: find_consecutive(x, on = 'Yes No'))

# For each sequence extract the last time
last_dates = grouped.last()\     # Explicitly: apply(lambda x: x['Time'].iloc[-1])\
                    .reset_index(level = 1, drop = False)

# A bit of formatting to extract only dates for "Yes" (there is probably 
# a cleaner way to do this)    
yes_last_dates =  last_dates.pipe(lambda x: x[x["Yes No"]==1]['Time'])\
                            .pipe(lambda x: x.reset_index(drop = True))

This gives the expected result:

0   2015-01-08
1   2015-01-20
2   2015-01-29

You can inspect grouped doing the following:

for key, group in grouped:
    print key, group

(1, 1.0)         Time  Yes No
0 2015-01-01     1.0
1 2015-01-02     1.0
2 2015-01-03     1.0
3 2015-01-04     1.0
4 2015-01-05     1.0
5 2015-01-06     1.0
6 2015-01-07     1.0
7 2015-01-08     1.0
(2, 0.0)          Time  Yes No
8  2015-01-09     0.0
9  2015-01-10     0.0
10 2015-01-11     0.0
11 2015-01-12     0.0
(3, 1.0)          Time  Yes No
12 2015-01-13     1.0
13 2015-01-14     1.0
14 2015-01-15     1.0
15 2015-01-16     1.0
16 2015-01-17     1.0
17 2015-01-18     1.0
18 2015-01-19     1.0
19 2015-01-20     1.0

....

Upvotes: 0

jezrael
jezrael

Reputation: 862681

You can use diff with eq for boolean mask and filter by boolean indexing:

print (df[df['Yes_No'].diff(-1).eq(1)])

         Time  Yes_No
7  2015-01-08     1.0
19 2015-01-20     1.0
28 2015-01-29     1.0

print (df.loc[df['Yes_No'].diff(-1).eq(1), 'Time'])

7    2015-01-08
19   2015-01-20
28   2015-01-29
Name: Time, dtype: datetime64[ns]

Upvotes: 4

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210842

you can use Series.shift(-1) in conjunction with Series.diff() methods

In [42]: df.loc[df['Yes No'].shift(-1).diff().eq(-1)]
Out[42]:
         Time  Yes No
7  2015-01-08     1.0
19 2015-01-20     1.0
28 2015-01-29     1.0

In [43]: df.loc[df['Yes No'].shift(-1).diff().eq(-1), 'Time']
Out[43]:
7    2015-01-08
19   2015-01-20
28   2015-01-29
Name: Time, dtype: datetime64[ns]

Explanation:

In [44]: df['Yes No'].shift(-1).diff()
Out[44]:
0     NaN
1     0.0
2     0.0
3     0.0
4     0.0
5     0.0
6     0.0
7    -1.0
8     0.0
9     0.0
10    0.0
11    1.0
12    0.0
13    0.0
14    0.0
15    0.0
16    0.0
17    0.0
18    0.0
19   -1.0
20    0.0
21    0.0
22    0.0
23    0.0
24    1.0
25    0.0
26    0.0
27    0.0
28   -1.0
29    0.0
30    0.0
31    NaN
Name: Yes No, dtype: float64

Upvotes: 4

Related Questions