Reputation: 615
I there's a lot on here about merging databases but I'm struggling with my problem.
I'm looking to create a database of racing results. I get all my data from csv files, there's csv files for the win market, and the place market, and 1 for each, on each day. So every date has 2 csv files, a win and a place.
I've attempted to start creating this data base by joining the last 2 days data together. Which has worked nicely so far.
import pandas as pd
win_1 = pd.read_csv('dwbfpricesukwin17092018.csv')
win_2 = pd.read_csv('dwbfpricesukwin16092018.csv')
place_1 = pd.read_csv('dwbfpricesukplace17092018.csv')
place_2 = pd.read_csv('dwbfpricesukplace16092018.csv')
win_data = win_1.append(win_2)
place_data = place_1.append(place_2)
place_data.rename(columns={'WIN_LOSE': 'WIN_LOSE_PLC', 'BSP': 'BSP_PLC'}, inplace=True)
total_data = win_data.merge(place_data[['WIN_LOSE_PLC', 'BSP_PLC', 'SELECTION_NAME']], on='SELECTION_NAME', how='outer')
total_data[['EVENT_DT', 'SELECTION_NAME', 'WIN_LOSE', 'BSP', 'WIN_LOSE_PLC', 'BSP_PLC']].to_csv('data.csv')
However I want to create a data base for the whole year.
I was thinking of creating two variables, start and end date and looping thru them, getting the loop to change the date in the csv file. But I kind of don't really know where to start.
start_date = '01012018'
end_date = '17092018'
How can I get the loop to know those two numbers are dates, and to go thru them 1 day at a time?
Also am I right to use .append here? I wanted to use join or merge but it would just create new columns instead of sending the new data to the bottom of the data base.
I hope I've explained myself enough, thanks in advance for any help.
Upvotes: 0
Views: 38
Reputation: 18647
You could use Pandas.date_range
with list comprehension.
Something like this:
win_template = 'dwbfpricesukwin{}.csv'
place_template = 'dwbfpricesukwin{}.csv'
start_date = '2018-01-01'
end_date = '2018-09-17'
dates = pd.date_range(start=start_date, end=end_date, freq='D')
win_files = [win_template.format(x.strftime('%d%m%Y')) for x in dates]
place_files = [place_template.format(x.strftime('%d%m%Y')) for x in dates]
Then continuing with list comprehensions, create your DataFrames using Pandas.concat
:
df_win = pd.concat([pd.read_csv(win) for win in win_files])
df_place = pd.concat([pd.read_csv(place) for place in place_files])
Upvotes: 2