AgentArachnid
AgentArachnid

Reputation: 43

Compare dates based on group in pandas

Alright, so I want to get the difference between date and times based on the user that logged them. I can't seem to find anything online about this so this is my last resort. I have this:

timePage['Time spent on page'] = timePage['date'].diff()

It gives me the difference between the dates on my date column, but it also doesn't take into account the different users. I've grouped my data by user, then instance then date and want the code to reach the bottom of the user group and say that the next date shouldn't be compared to this date as it's a different user. Also, it doesn't work for the first index of my data. so how would I get it to compare the first and second index and then store that on the first indexed row?

Sample Data:

L = [{ "user" : "43tuhgn34",
   "Instance" : "f34yh",
   "Date" : "2019-02-14 12:54:52.000000+00:00"},
 { "user" : "43tuhgn34",
   "Instance" : "wrg32",
   "Date" : "2019-02-15 11:34:12.000000+00:00"},
{ "user" : "erg34g",
   "Instance" : "f34yh",
   "Date" : "2018-03-11 17:21:26.430000+00:00"}]

Upvotes: 1

Views: 351

Answers (1)

jezrael
jezrael

Reputation: 862641

Use GroupBy.apply with lambda function, bfill here working per groups:

timePage = pd.DataFrame(L)

timePage['Date'] = pd.to_datetime(timePage['Date'])

timePage['Time spent on page'] = (timePage.groupby('user')['Date']
                                          .apply(lambda x: x.diff().bfill()))
print (timePage)
        user Instance                             Date Time spent on page
0  43tuhgn34    f34yh        2019-02-14 12:54:52+00:00           22:39:20
1  43tuhgn34    wrg32        2019-02-15 11:34:12+00:00           22:39:20
2     erg34g    f34yh 2018-03-11 17:21:26.430000+00:00                NaT

If all groups has at least 2 rows is possible chain DataFrameGroupBy.diff with bfill, but be carefull, because bfill here NOT working per groups:

timePage = pd.DataFrame(L)

timePage['Date'] = pd.to_datetime(timePage['Date'])

timePage['Time spent on page'] = timePage.groupby('user')['Date'].diff().bfill()
print (timePage)

Upvotes: 1

Related Questions