Reputation: 23
I have about 7500 csv that needs to be merged into a single one in order to create an easy-readable table. The files format are as follow:
each file is denominated as a ticker of a stock (example: AA.csv, AAL.csv, AAPL.csv,etc...)
every file itself contains a date and a number in this format
2018-10-11,1
2018-10-12,3
2018-10-15,2
...
Now I want to merge them into a single csv file where the resulting table has in the header the name of the tickers, in the first column the dates and following the numbers (obviously keeping the csv format).
Example:
Note that some csv files are empty and some have differents starting dates or dates are just discontinued
Upvotes: 1
Views: 147
Reputation:
Since you have non-uniform csvs, and some specific requirements:
csv
module:import os,csv
csv_dir = "C:\\Users\Diego\\PycharmProjects\\thesis\\Tickers List\\"
csv_out = 'csvout.csv'
all_dates = []
fin_table = dict()
# iterating over all csv files
for csv_file in os.listdir(csv_dir):
if True:
csv_name = csv_file #[:-4] # excluding the ".csv" extension
table = dict()
print(csv_dir+csv_file)
with open(csv_dir+csv_file, 'r') as r:
csvReader = csv.reader(r)
for i, row in enumerate(csvReader):
all_dates.append(row[0])
table[row[0]] = row[1]
fin_table[csv_name] = table
# populating the final csv
with open(csv_out, 'w+') as w:
csvWriter = csv.writer(w)
# adding the header
input_row = ['Dates']
for stock, table in iter(sorted(fin_table.items())):
input_row.append(stock)
csvWriter.writerow(input_row)
# Adding the rest of the table by available date
# since, some dates may not have a value, as you said
for date in sorted(all_dates):
# adding the date rows and stock values 1 by 1
input_row = [date]
for stock, table in iter(sorted(fin_table.items())):
# NOVALUE will be put in cells
# where a value doesn't exist for a date
input_row.append(table.get(date, 'NOVALUE'))
csvWriter.writerow(input_row)
# ************************** end of program code
Upvotes: 0
Reputation: 891
You could do something like this:
import pandas as pd
import numpy as np
from glob import glob
dfs_list = []
for csv_file in glob('Tickers List/*.csv'):
stock_ticker = csv_file.split(".")[0]
df = pd.read_csv(csv_file,header=None, names=["date","num"])
if df.shape[0] >0:
df["date"] = pd.to_datetime(df["date"],format="%Y-%m-%d")
df["stock_ticker"] = stock_ticker
dfs_list.append(df)
final_df = pd.concat(dfs_list)
With glob('dir/*.csv')
we get all the csv files on a folder.
After this, you will have a DataFrame that looks like this:
If you want to change to your format, you can do this: (note that the dates are automatically sorted because is used as index).
final_df = pd.pivot_table(final_df, values='num', index=['date'],
columns=['stock_ticker'], fill_value=np.nan)
And you will have a DataFrame that looks like this:
Now you can write this DataFrame to a new csv with:
final_df.to_csv("merged.csv")
import pandas as pd
import numpy as np
from glob import glob
dfs_list = []
for csv_file in glob('Tickers List/*.csv'):
stock_ticker = csv_file.split(".")[0]
df = pd.read_csv(csv_file,header=None, names=["date","num"])
if df.shape[0] >0:
df["date"] = pd.to_datetime(df["date"],format="%Y-%m-%d")
df["stock_ticker"] = stock_ticker
dfs_list.append(df)
final_df = pd.concat(dfs_list)
final_df = pd.pivot_table(final_df, values='num', index=['date'],
columns=['stock_ticker'], fill_value=np.nan)
final_df.to_csv("merged.csv")
Upvotes: 1