top bantz
top bantz

Reputation: 615

Building a database of time series using a loop

I there's a lot on here about merging databases but I'm struggling with my problem.

I'm looking to create a database of racing results. I get all my data from csv files, there's csv files for the win market, and the place market, and 1 for each, on each day. So every date has 2 csv files, a win and a place.

I've attempted to start creating this data base by joining the last 2 days data together. Which has worked nicely so far.

import pandas as pd

win_1 = pd.read_csv('dwbfpricesukwin17092018.csv')
win_2 = pd.read_csv('dwbfpricesukwin16092018.csv')
place_1 = pd.read_csv('dwbfpricesukplace17092018.csv')
place_2 = pd.read_csv('dwbfpricesukplace16092018.csv')

win_data = win_1.append(win_2)
place_data = place_1.append(place_2)

place_data.rename(columns={'WIN_LOSE': 'WIN_LOSE_PLC', 'BSP': 'BSP_PLC'}, inplace=True)

total_data = win_data.merge(place_data[['WIN_LOSE_PLC', 'BSP_PLC', 'SELECTION_NAME']], on='SELECTION_NAME', how='outer')

total_data[['EVENT_DT', 'SELECTION_NAME', 'WIN_LOSE', 'BSP', 'WIN_LOSE_PLC', 'BSP_PLC']].to_csv('data.csv')

However I want to create a data base for the whole year.

I was thinking of creating two variables, start and end date and looping thru them, getting the loop to change the date in the csv file. But I kind of don't really know where to start.

start_date = '01012018'
end_date = '17092018'

How can I get the loop to know those two numbers are dates, and to go thru them 1 day at a time?

Also am I right to use .append here? I wanted to use join or merge but it would just create new columns instead of sending the new data to the bottom of the data base.

I hope I've explained myself enough, thanks in advance for any help.

Upvotes: 0

Views: 38

Answers (1)

Chris Adams
Chris Adams

Reputation: 18647

You could use Pandas.date_range with list comprehension.

Something like this:

win_template = 'dwbfpricesukwin{}.csv'
place_template = 'dwbfpricesukwin{}.csv'

start_date = '2018-01-01'
end_date = '2018-09-17'
dates = pd.date_range(start=start_date, end=end_date, freq='D')

win_files = [win_template.format(x.strftime('%d%m%Y')) for x in dates]
place_files = [place_template.format(x.strftime('%d%m%Y')) for x in dates]

Then continuing with list comprehensions, create your DataFrames using Pandas.concat:

df_win = pd.concat([pd.read_csv(win) for win in win_files])
df_place = pd.concat([pd.read_csv(place) for place in place_files])

Upvotes: 2

Related Questions