Reputation: 85
I have about 2,000 CSV's I was hoping to read into a df but first I was wondering how someone would (before joining all the CSVs) write the name of every CSV in the every row of the CSV. Like for example, in CSV1, there would be a column that would say "CSV1" in every row. And same for CSV2, 3 etc.
Was wondering if there was a way to accomplish this?
import os
import glob
import pandas as pd
os.chdir(r"C:\Users\User\Downloads\Complete Corporate Financial History")
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames ])
The csv files all look like this: https://docs.google.com/spreadsheets/d/1hOb_nNjB3K8ldyyBUemQlcsTWcjyD8iLh8XMa5XB8Qk/edit?usp=sharing
They don't have the Ticker (file name) in each row though.
Edit: Here are the column headers: Quarter end Shares Shares split adjusted Split factor Assets Current Assets Liabilities Current Liabilities Shareholders equity Non-controlling interest Preferred equity Goodwill & intangibles Long-term debt Revenue Earnings Earnings available for common stockholders EPS basic EPS diluted Dividend per share Cash from operating activities Cash from investing activities Cash from financing activities Cash change during period Cash at end of period Capital expenditures Price Price high Price low ROE ROA Book value of equity per share P/B ratio P/E ratio Cumulative dividends per share Dividend payout ratio Long-term debt to equity ratio Equity to assets ratio Net margin Asset turnover Free cash flow per share Current ratio
and the rows descend by quarter.
,Quarter end,Shares,Shares split adjusted,Split factor,Assets,Current Assets,Liabilities,Current Liabilities,Shareholders equity,Non-controlling interest,Preferred equity,Goodwill & intangibles,Long-term debt,Revenue,Earnings,Earnings available for common stockholders,EPS basic,EPS diluted,Dividend per share,Cash from operating activities,Cash from investing activities,Cash from financing activities,Cash change during period,Cash at end of period,Capital expenditures,Price,Price high,Price low,ROE,ROA,Book value of equity per share,P/B ratio,P/E ratio,Cumulative dividends per share,Dividend payout ratio,Long-term debt to equity ratio,Equity to assets ratio,Net margin,Asset turnover,Free cash flow per share,Current ratio
0,6/30/2019,440000000.0,440000000.0,1.0,17900000000.0,6020000000.0,13000000000.0,3620000000.0,4850000000.0,12000000.0,55000000,5190000000.0,5900000000.0,3.69E+09,-1.20E+08,-1.20E+08,-0.27,-0.27,0.08,1.06E+08,1.29E+08,-2.00E+08,34000000,1360000000.0,128000000.0,22.55,25.83,19.27,0.0855,0.0243,10.9,1.98,16.11,33.46,0.2916,1.2296,0.2679,0.0311,0.78,-0.05,1.662
1,3/31/2019,449000000.0,449000000.0,1.0,18400000000.0,6050000000.0,13200000000.0,3660000000.0,5170000000.0,12000000.0,55000000,5420000000.0,5900000000.0,3.54E+09,1.87E+08,1.86E+08,0.4,0.39,0.08,-2.60E+08,42000000,-7.40E+08,-9.60E+08,1330000000.0,164000000.0,18.37,20.61,16.12,0.1298,0.0373,11.39,1.61,14.13,33.38,0.1798,1.1542,0.2784,0.0485,0.77,-0.94,1.6543
2,12/31/2018,485000000.0,485000000.0,1.0,18700000000.0,6580000000.0,13100000000.0,3520000000.0,5570000000.0,12000000.0,55000000,7250000000.0,5900000000.0,3.47E+09,2.18E+08,2.18E+08,0.45,0.45,0.06,4.26E+08,3.54E+08,-4.00E+07,7.40E+08,2280000000.0,-31000000.0,19.62,23.6,15.63,0.1208,0.035,11.38,1.79,None,33.3,0.1813,1.0685,0.2952,0.0457,0.76,0.94,1.8696
3,9/30/2018,483000000.0,483000000.0,1.0,18300000000.0,6130000000.0,13000000000.0,3010000000.0,5360000000.0,14000000.0,55000000,5470000000.0,6320000000.0,3.52E+09,1.61E+08,1.60E+08,0.33,0.32,0.06,51000000,65000000,-3.20E+07,82000000,1540000000.0,207000000.0,19.88,23.13,16.64,-0.0594,-0.0165,10.98,1.86,None,33.24,None,1.1902,0.2895,None,0.75,-0.32,2.0345
4,6/30/2018,483000000.0,483000000.0,1.0,18200000000.0,6080000000.0,13000000000.0,2980000000.0,5200000000.0,14000000.0,55000000,5480000000.0,6310000000.0,3.57E+09,1.20E+08,1.20E+08,0.25,0.24,0.06,1.76E+08,1.17E+08,-3.50E+07,2.52E+08,1460000000.0,166000000.0,20.27,24.07,16.47,-0.069,-0.0186,10.66,1.88,None,33.18,None,1.2259,0.2826,None,0.73,0.02,2.0406
5,3/31/2018,483000000.0,483000000.0,1.0,18200000000.0,5900000000.0,12900000000.0,2800000000.0,5270000000.0,14000000.0,55000000,5560000000.0,6310000000.0,3.45E+09,1.43E+08,1.42E+08,0.3,0.29,0.06,-4.40E+08,29000000,-5.40E+08,-9.50E+08,1210000000.0,117000000.0,26.87,31.17,22.57,-0.0536,-0.0134,10.8,2.67,None,33.12,None,1.2102,0.2861,None,0.7,-1.15,2.1039
6,12/31/2017,483000000.0,483000000.0,1.0,18700000000.0,6380000000.0,13800000000.0,2820000000.0,4910000000.0,14000000.0,55000000,7410000000.0,6810000000.0,3.27E+09,-7.30E+08,-7.30E+08,-1.51,-1.51,0.06,6.12E+08,-2.40E+08,-4.50E+07,3.35E+08,2150000000.0,236000000.0,25.3,27.85,22.74,-0.0232,-0.0038,10.06,2.07,None,33.06,None,1.4019,0.2594,None,0.67,0.78,2.2585
7,9/30/2017,481000000.0,481000000.0,1.0,19200000000.0,6150000000.0,13300000000.0,2680000000.0,5950000000.0,13000000.0,55000000,5250000000.0,6800000000.0,3.24E+09,1.19E+08,1.01E+08,0.23,0.22,0.06,1.72E+08,-1.30E+08,-1.50E+07,30000000,1820000000.0,131000000.0,24.76,26.84,22.67,-0.1222,-0.0308,12.24,1.92,None,33.0,None,1.1543,0.3063,None,0.65,0.09,2.2966
8,6/30/2017,441000000.0,441000000.0,1.0,19100000000.0,6030000000.0,13400000000.0,2660000000.0,5740000000.0,13000000.0,55000000,5220000000.0,6800000000.0,3.26E+09,2.12E+08,1.94E+08,0.44,0.43,0.06,2.17E+08,-1.30E+08,-8.60E+08,-7.70E+08,1790000000.0,125000000.0,25.2,28.65,21.75,-0.0899,-0.0231,12.89,2.05,None,32.94,None,1.1954,0.2976,None,0.61,0.21,2.2698
9,3/31/2017,441000000.0,441000000.0,1.0,20200000000.0,6710000000.0,14700000000.0,2590000000.0,5480000000.0,13000000.0,55000000,5170000000.0,8050000000.0,3.19E+09,3.22E+08,3.05E+08,0.69,0.65,0.06,-3.00E+08,1.03E+09,-4.30E+07,6.90E+08,2550000000.0,113000000.0,24.66,30.69,18.64,-0.0815,-0.0223,12.31,2.15,None,32.88,None,1.4826,0.2692,None,0.59,-0.94,2.5937
10,12/31/2016,441000000.0,441000000.0,1.0,20000000000.0,5890000000.0,14900000000.0,2750000000.0,5120000000.0,26000000.0,55000000,6940000000.0,8040000000.0,3.06E+09,-1.30E+09,-1.30E+09,-2.92,-2.92,7.76,6.62E+08,-2.40E+08,-4.00E+08,0,1860000000.0,302000000.0,24.43,32.1,16.75,-0.098,-0.029,11.49,0.91,None,32.82,None,1.5897,0.2525,None,0.57,0.82,2.1433
11,9/30/2016,438000000.0,438000000.0,1.0,37400000000.0,9370000000.0,23500000000.0,5500000000.0,11800000000.0,2170000000.0,55000000,5380000000.0,9500000000.0,5.21E+09,1.66E+08,1.48E+08,0.34,0.33,0.09,3.06E+08,-2.30E+08,-1.40E+08,-6.60E+07,1860000000.0,152000000.0,30,32.91,27.09,-0.0377,-0.0105,26.73,1.07,None,25.06,None,0.8107,0.313,None,0.57,0.35,1.7033
12,6/30/2016,1320000000.0,438000000.0,0.333333,36100000000.0,8090000000.0,21600000000.0,5490000000.0,12300000000.0,2190000000.0,55000000,5400000000.0,8280000000.0,5.30E+09,1.35E+08,1.18E+08,0.09,0.09,0.03,3.32E+08,3.11E+08,-1.00E+08,5.45E+08,1930000000.0,-50000000.0,30.42,34.5,26.34,-0.047,-0.0139,28.01,1.1,None,24.97,None,0.6741,0.3398,None,0.58,0.87,1.4747
13,3/31/2016,1320000000.0,438000000.0,0.333333,36100000000.0,7670000000.0,21800000000.0,5560000000.0,12200000000.0,2140000000.0,55000000,5400000000.0,8260000000.0,4.95E+09,16000000,-2000000,0,0,0.03,-4.30E+08,-1000000,-1.10E+08,-5.40E+08,1380000000.0,29000000.0,24.54,30.66,18.42,-0.0467,-0.0137,27.76,0.9,None,24.88,None,0.6784,0.3368,None,0.59,-1.05,1.3798
14,12/31/2015,1310000000.0,438000000.0,0.333333,36500000000.0,7950000000.0,22400000000.0,5210000000.0,12000000000.0,2090000000.0,55000000,7540000000.0,9040000000.0,5.25E+09,-7.00E+08,-7.20E+08,-0.55,-0.55,0.03,8.65E+08,-4.60E+08,-2.30E+08,1.80E+08,1920000000.0,398000000.0,28.48,33.54,23.43,-0.0324,-0.0089,27.36,0.99,25.66,24.79,None,0.7542,0.3283,None,0.62,1.07,1.5262
Upvotes: 1
Views: 63
Reputation: 144
Can't think of an inbuilt way of doing this, but an alternative way is,
Replying from my phone so couldn't type in a working code, but it's easy if you think about it.
KR, Alex
Upvotes: 0
Reputation: 1025
You could try something like this, then:
df_list = []
for filename in all_filenames:
df = pd.read_csv(filename)
# Adds a column Ticker to the dataframe with the filename in the column.
# The split function will work if no filename has more than one period.
# Otherwise, you can use Python built-in function to trim off the extension.
df['Ticker'] = filename.split('.')[0]
df_list.append(df)
all_dfs = pd.concat(df_list, axis=0)
Upvotes: 1