chopin_is_the_best
chopin_is_the_best

Reputation: 2101

Average Time difference in pandas

I am trying to calculate the average time difference, in hours/minutes/seconds, iterating on a field - in my example, for each different ip address. Moreover, a column containing the count of each ip row.

My dataframe looks like:

date                  ipAddress
2016-08-08 00:39:00   98.249.244.22
2016-08-08 13:03:00   98.249.244.22
2016-08-20 21:37:00   98.211.135.179
2016-08-21 16:11:00   98.211.135.179
2016-08-21 16:19:00   98.211.135.179
2016-08-25 01:30:00   98.248.215.244

My desired output:

ipAddress         avg_time_diff    count
98.249.244.22     avg_diff_1         2
98.211.135.179    avg_diff_2         3
98.248.215.244    0                  1

A reproducicle df:

{u'date': {3233: Timestamp('2016-08-08 00:39:00'),
  3551: Timestamp('2016-08-08 13:03:00'),
  349036: Timestamp('2016-08-20 21:37:00'),
  349040: Timestamp('2016-08-21 16:11:00'),
  349049: Timestamp('2016-08-21 16:19:00'),
  378843: Timestamp('2016-08-25 01:30:00')},
 u'ipAddress': {3233: u'98.249.244.22',
  3551: u'98.249.244.22',
 49036: u'98.211.135.179',
 349040: u'98.211.135.179',
 349049: u'98.211.135.179',
 378843: u'98.248.215.244'}}

I have no clue where to start, it tried timediff but I am not sure I've understood how it works and how to iterate over rows as a "window function". Thanks in advance

Upvotes: 6

Views: 4394

Answers (1)

Alexander
Alexander

Reputation: 109546

See applying different functions to dataframe columns:

(df.groupby('ipAddress')
   .date
   .agg({'count': 'count', 
         'avg_time_diff': lambda group: group.sort_values().diff().mean()}))

# Output 
#                count  avg_time_diff
# ipAddress                           
# 98.211.135.179      2       00:08:00
# 98.248.215.244      1            NaT
# 98.249.244.22       2       12:24:00

Upvotes: 6

Related Questions