Ayman Almuti
Ayman Almuti

Reputation: 61

how to compute the average time in python pandas

I have a CSV file that has time represented in a format I'm not familiar with:

image

I am trying to compute the average time in all of those rows (efforts shown below). Any sort of feedback will be appreciated.

import pandas as pd
import pandas as np
from datetime import datetime


flyer = pd.read_csv("./myfile.csv",parse_dates = ['timestamp'])

flyer.dropna(axis=0, how='any', thresh=None, subset=None, inplace=True)

pd.set_option('display.max_rows', 20)

flyer['timestamp'] = pd.to_datetime(flyer['timestamp'], 
infer_datetime_format=True)
p = flyer.loc[:,'timestamp'].mean()


print(flyer['timestamp'].mean())

Upvotes: 0

Views: 7730

Answers (2)

Jordan
Jordan

Reputation: 375

When you read the csv with pandas, add parse_dates = ['timestamp'] to the pd.read_csv() function call and it will read in that column correctly. The T in the timestamp field is a common way to separate the date and the time.

The -4:00 indicates time zone information, which in this case means -4:00 hours in comparison to UTC time.

As for calculating the mean time, that can get a bit tricky, but here's one solution for after you've imported the csv.

from datetime import datetime

pd.to_datetime(datetime.fromtimestamp(pd.to_timedelta(df['timestamp'].mean().total_seconds())))

This is converting the field to a datetime object in order to calculate the mean, then getting the total seconds (EPOCH time) and using that to convert back into a pandas datetime series.

Upvotes: 0

Ben Dickson
Ben Dickson

Reputation: 306

The above is correct, but if you're new it might not be as clear what 0x is feeding you.

import pandas as pd

# turn your csv into a pandas dataframe
df = pd.read_csv('your/file/location.csv')

The timestamp column might be interpreted as a bunch of strings, you won't be able to do the math you want on strings.

# this forces the column's data into timestamp variables
df['timestamp'] = pd.to_datetime(df['timestamp'], infer_datetime_format=True)

# now for your answer, get the average of the timestamp column
print(df['timestamp'].mean())

Upvotes: 1

Related Questions