Reputation: 39
I have a large data base that looks like this:
id, Start Time, End Time
0, 2017-01-01 00:00:21, 2017-01-01 00:11:41
1, 2017-01-01 00:00:45, 2017-01-01 00:11:46
2, 2017-02-01 00:00:57, 2017-02-01 00:22:08
3, 2017-03-01 00:01:10, 2017-03-01 00:11:42
4, 2017-01-01 00:01:51, 2017-01-01 00:12:57
Using pandas would probably be easier to do this, but I don't have much experience with it. I have researched modules such as arrow
and datetime
and would like to filter the data based on the user's input. Using that input, the user is returned filtered data. For example:
def get_month('data.csv'):
month = input('\nWhich month? January, February, March, April, May, or June?\n')
date = '1 ' + month + ', 2017'
with open(city_data, 'r') as fin, open('userdata.csv', 'w') as fout:
writer = csv.writer(fout, delimiter=' ')
for row in csv.reader(fin, delimiter=' '):
if row[0] == arrow.get(date,'D MMMM, YYYY').format('YYYY-MM-DD'):
return writer.writerow(row)
Am I approaching this correctly? I think I may be going in the wrong direction in the date = '1 ' + month + ', 2017'
part. Is there a way I could filter the data with just an input like January
?
Upvotes: 3
Views: 1505
Reputation: 164823
For structured data, pandas
provides an efficient solution:
from datetime import datetime
import pandas as pd
# read data from file
df = pd.read_csv('data.csv')
# this creates a dataframe as below:
# id Start Time End Time
# 0 0 2017-01-01 00:00:21 2017-01-01 00:11:41
# 1 1 2017-01-01 00:00:45 2017-01-01 00:11:46
# 2 2 2017-02-01 00:00:57 2017-02-01 00:22:08
# 3 3 2017-03-01 00:01:10 2017-03-01 00:11:42
# 4 4 2017-01-01 00:01:51 2017-01-01 00:12:57
# cast string columns to datetime
df['Start Time'] = pd.to_datetime(df['Start Time'])
df['End Time'] = pd.to_datetime(df['End Time'])
def get_month(df):
month = input('\nWhich month? January, February, March, April, May, or June?\n')
return df[df['Start Time'].dt.month == datetime.strptime(month, '%B').month]
get_month(df)
Upvotes: 3