Reputation: 43
Alright, so I want to get the difference between date and times based on the user that logged them. I can't seem to find anything online about this so this is my last resort. I have this:
timePage['Time spent on page'] = timePage['date'].diff()
It gives me the difference between the dates on my date column, but it also doesn't take into account the different users. I've grouped my data by user, then instance then date and want the code to reach the bottom of the user group and say that the next date shouldn't be compared to this date as it's a different user. Also, it doesn't work for the first index of my data. so how would I get it to compare the first and second index and then store that on the first indexed row?
Sample Data:
L = [{ "user" : "43tuhgn34",
"Instance" : "f34yh",
"Date" : "2019-02-14 12:54:52.000000+00:00"},
{ "user" : "43tuhgn34",
"Instance" : "wrg32",
"Date" : "2019-02-15 11:34:12.000000+00:00"},
{ "user" : "erg34g",
"Instance" : "f34yh",
"Date" : "2018-03-11 17:21:26.430000+00:00"}]
Upvotes: 1
Views: 351
Reputation: 862641
Use GroupBy.apply
with lambda function, bfill
here working per groups:
timePage = pd.DataFrame(L)
timePage['Date'] = pd.to_datetime(timePage['Date'])
timePage['Time spent on page'] = (timePage.groupby('user')['Date']
.apply(lambda x: x.diff().bfill()))
print (timePage)
user Instance Date Time spent on page
0 43tuhgn34 f34yh 2019-02-14 12:54:52+00:00 22:39:20
1 43tuhgn34 wrg32 2019-02-15 11:34:12+00:00 22:39:20
2 erg34g f34yh 2018-03-11 17:21:26.430000+00:00 NaT
If all groups has at least 2 rows is possible chain DataFrameGroupBy.diff
with bfill
, but be carefull, because bfill
here NOT working per groups:
timePage = pd.DataFrame(L)
timePage['Date'] = pd.to_datetime(timePage['Date'])
timePage['Time spent on page'] = timePage.groupby('user')['Date'].diff().bfill()
print (timePage)
Upvotes: 1