Reputation: 21

How to check for row value in previous rows over date-time in pandas dataframe?

I'd like to take the following data, and check for each day whether the animal was observed the previous day, then create a count per day of new animals observed.

import pandas as pd
data = {'Date': pd.to_datetime(['18/08/2018', '18/08/2018', '18/08/2018', 
                                '19/08/2018', '19/08/2018', '19/08/2018', 
                                '19/08/2018', '19/08/2018', '20/08/2018', 
                                '20/08/2018', '20/08/2018']),
          'Animal':        ['cat', 'dog', 'mouse', 'cat', 'dog', 'mouse', 'rabbit', 'rat', 'lion', 'tiger', 'monkey']
    }

df = pd.DataFrame(data)

With a result something like:

    1. 18/08/2018   3
    2. 19/08/2018   2
    3. 20/08/2018   3

I'm very new to Python, so any help very appreciated! Thx.

Upvotes: 2

Answers (2)

anky

Reputation: 75080

Here is another proposal which uses aggregation as set then shift and check difference

m = df.groupby('Date')['Animal'].agg(set)
n = m.str.len()
n.iloc[1:] = [len(a.difference(b)) for a,b in zip(m,m.shift().fillna(m.head(1)))][1:]
print(n)

print(n)

Date
2018-08-18    3
2018-08-19    2
2018-08-20    3
dtype: int64

Upvotes: 3

yatu

Reputation: 88236

Here's one approach using pd.factorize:

s = (pd.Series(pd.factorize(df.Animal)[0]).groupby(df.Date).max()+1)
# decumulate and fill first row
s.diff().fillna(s) 

Date
2018-08-18    3.0
2018-08-19    2.0
2018-08-20    3.0
dtype: float64

Where by factorizing we are encoding as an enumerated type:

pd.factorize(df.Animal)[0]
# array([0, 1, 2, 0, 1, 2, 3, 4, 5, 6, 7], dtype=int64)

And by grouping by the Date and obtaining the max, we are getting the acumulated amount of new animals:

Date
2018-08-18    3
2018-08-19    5
2018-08-20    8
dtype: int64

Now we can just obtain the diff to decumulate the Series:

Upvotes: 3

How to check for row value in previous rows over date-time in pandas dataframe?

Answers (2)

Related Questions