Reputation: 325
I have a dataset that looks like this,
"2018-05-30 21:26:43",20.61129150,-100.40933971
"2018-05-30 21:26:43",20.61127415,-100.41146822
"2018-06-02 21:56:12",21.15633228,-100.93766080
"2018-06-05 22:57:40",20.59734201,-100.38091286
"2018-06-05 22:57:40",20.59875096,-100.37821426
"2018-06-06 20:56:22",20.61278120,-100.38446619
"2018-06-06 20:56:22",20.59865452,-100.37827264
"2018-06-06 21:57:15",20.59862012,-100.37817348
"2018-06-06 21:57:15",20.59864713,-100.37821263
"2018-06-06 21:57:15",20.59862915,-100.37825902
"2018-06-07 15:54:29",20.61280757,-100.39768857
"2018-06-07 15:54:29",20.61276216,-100.39769379
I want to separate my data into day groups so i can calculate distances and come up with the average distance travelled per day.
Im currently separating it by my date column like this:
col_names = ['date', 'latitude', 'longitude']
df = pd.read_csv('marco.csv', names=col_names, sep=',', skiprows=1)
# merge
m = df.reset_index().merge(df.reset_index(), on='date')
However i would like to separate it by day so that i get indexes of
2018-05-30, 2018-06-05, 2018-06-06, 2018-06-07
How would i approach this problem?
Upvotes: 0
Views: 32
Reputation: 813
As Yuca mentioned, group by should do the trick. I would make a new column called "day" that just contains the day from your time stamp, sort by date, group by "date", then calculate the distance traveled in each group.
import pandas as pd
a = pd.DataFrame(
[["2018-05-30 21:26:43",20.61129150,-100.40933971],
["2018-05-30 21:26:43",20.61127415,-100.41146822],
["2018-06-02 21:56:12",21.15633228,-100.93766080],
["2018-06-05 22:57:40",20.59734201,-100.38091286]],
columns=['date', 'lat', 'lng'])
a['date'] = pd.to_datetime(a['date'])
a['day'] = a['date'].dt.date
b = a.groupby('day')
# Loop over the groups and do whatever calculation you need
for tup in b:
group = tup[0]
df = tup[1]
print df['lat'].sum()
Upvotes: 1