Reputation: 61
I have a CSV file that has time represented in a format I'm not familiar with:
I am trying to compute the average time in all of those rows (efforts shown below). Any sort of feedback will be appreciated.
import pandas as pd
import pandas as np
from datetime import datetime
flyer = pd.read_csv("./myfile.csv",parse_dates = ['timestamp'])
flyer.dropna(axis=0, how='any', thresh=None, subset=None, inplace=True)
pd.set_option('display.max_rows', 20)
flyer['timestamp'] = pd.to_datetime(flyer['timestamp'],
infer_datetime_format=True)
p = flyer.loc[:,'timestamp'].mean()
print(flyer['timestamp'].mean())
Upvotes: 0
Views: 7730
Reputation: 375
When you read the csv with pandas, add parse_dates = ['timestamp']
to the pd.read_csv()
function call and it will read in that column correctly. The T in the timestamp field is a common way to separate the date and the time.
The -4:00 indicates time zone information, which in this case means -4:00 hours in comparison to UTC time.
As for calculating the mean time, that can get a bit tricky, but here's one solution for after you've imported the csv.
from datetime import datetime
pd.to_datetime(datetime.fromtimestamp(pd.to_timedelta(df['timestamp'].mean().total_seconds())))
This is converting the field to a datetime object in order to calculate the mean, then getting the total seconds (EPOCH time) and using that to convert back into a pandas datetime series.
Upvotes: 0
Reputation: 306
The above is correct, but if you're new it might not be as clear what 0x is feeding you.
import pandas as pd
# turn your csv into a pandas dataframe
df = pd.read_csv('your/file/location.csv')
The timestamp column might be interpreted as a bunch of strings, you won't be able to do the math you want on strings.
# this forces the column's data into timestamp variables
df['timestamp'] = pd.to_datetime(df['timestamp'], infer_datetime_format=True)
# now for your answer, get the average of the timestamp column
print(df['timestamp'].mean())
Upvotes: 1