Jacob Steenhuysen
Jacob Steenhuysen

Reputation: 85

Reading in CSVs and how to write the name of the CSV file into every row of the CSV

I have about 2,000 CSV's I was hoping to read into a df but first I was wondering how someone would (before joining all the CSVs) write the name of every CSV in the every row of the CSV. Like for example, in CSV1, there would be a column that would say "CSV1" in every row. And same for CSV2, 3 etc.

Was wondering if there was a way to accomplish this?

import os
import glob
import pandas as pd
os.chdir(r"C:\Users\User\Downloads\Complete Corporate Financial History")

extension = 'csv'

all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames ])

The csv files all look like this: https://docs.google.com/spreadsheets/d/1hOb_nNjB3K8ldyyBUemQlcsTWcjyD8iLh8XMa5XB8Qk/edit?usp=sharing

They don't have the Ticker (file name) in each row though.

Edit: Here are the column headers: Quarter end Shares Shares split adjusted Split factor Assets Current Assets Liabilities Current Liabilities Shareholders equity Non-controlling interest Preferred equity Goodwill & intangibles Long-term debt Revenue Earnings Earnings available for common stockholders EPS basic EPS diluted Dividend per share Cash from operating activities Cash from investing activities Cash from financing activities Cash change during period Cash at end of period Capital expenditures Price Price high Price low ROE ROA Book value of equity per share P/B ratio P/E ratio Cumulative dividends per share Dividend payout ratio Long-term debt to equity ratio Equity to assets ratio Net margin Asset turnover Free cash flow per share Current ratio

and the rows descend by quarter.

Sample Data

,Quarter end,Shares,Shares split adjusted,Split factor,Assets,Current Assets,Liabilities,Current Liabilities,Shareholders equity,Non-controlling interest,Preferred equity,Goodwill & intangibles,Long-term debt,Revenue,Earnings,Earnings available for common stockholders,EPS basic,EPS diluted,Dividend per share,Cash from operating activities,Cash from investing activities,Cash from financing activities,Cash change during period,Cash at end of period,Capital expenditures,Price,Price high,Price low,ROE,ROA,Book value of equity per share,P/B ratio,P/E ratio,Cumulative dividends per share,Dividend payout ratio,Long-term debt to equity ratio,Equity to assets ratio,Net margin,Asset turnover,Free cash flow per share,Current ratio
0,6/30/2019,440000000.0,440000000.0,1.0,17900000000.0,6020000000.0,13000000000.0,3620000000.0,4850000000.0,12000000.0,55000000,5190000000.0,5900000000.0,3.69E+09,-1.20E+08,-1.20E+08,-0.27,-0.27,0.08,1.06E+08,1.29E+08,-2.00E+08,34000000,1360000000.0,128000000.0,22.55,25.83,19.27,0.0855,0.0243,10.9,1.98,16.11,33.46,0.2916,1.2296,0.2679,0.0311,0.78,-0.05,1.662
1,3/31/2019,449000000.0,449000000.0,1.0,18400000000.0,6050000000.0,13200000000.0,3660000000.0,5170000000.0,12000000.0,55000000,5420000000.0,5900000000.0,3.54E+09,1.87E+08,1.86E+08,0.4,0.39,0.08,-2.60E+08,42000000,-7.40E+08,-9.60E+08,1330000000.0,164000000.0,18.37,20.61,16.12,0.1298,0.0373,11.39,1.61,14.13,33.38,0.1798,1.1542,0.2784,0.0485,0.77,-0.94,1.6543
2,12/31/2018,485000000.0,485000000.0,1.0,18700000000.0,6580000000.0,13100000000.0,3520000000.0,5570000000.0,12000000.0,55000000,7250000000.0,5900000000.0,3.47E+09,2.18E+08,2.18E+08,0.45,0.45,0.06,4.26E+08,3.54E+08,-4.00E+07,7.40E+08,2280000000.0,-31000000.0,19.62,23.6,15.63,0.1208,0.035,11.38,1.79,None,33.3,0.1813,1.0685,0.2952,0.0457,0.76,0.94,1.8696
3,9/30/2018,483000000.0,483000000.0,1.0,18300000000.0,6130000000.0,13000000000.0,3010000000.0,5360000000.0,14000000.0,55000000,5470000000.0,6320000000.0,3.52E+09,1.61E+08,1.60E+08,0.33,0.32,0.06,51000000,65000000,-3.20E+07,82000000,1540000000.0,207000000.0,19.88,23.13,16.64,-0.0594,-0.0165,10.98,1.86,None,33.24,None,1.1902,0.2895,None,0.75,-0.32,2.0345
4,6/30/2018,483000000.0,483000000.0,1.0,18200000000.0,6080000000.0,13000000000.0,2980000000.0,5200000000.0,14000000.0,55000000,5480000000.0,6310000000.0,3.57E+09,1.20E+08,1.20E+08,0.25,0.24,0.06,1.76E+08,1.17E+08,-3.50E+07,2.52E+08,1460000000.0,166000000.0,20.27,24.07,16.47,-0.069,-0.0186,10.66,1.88,None,33.18,None,1.2259,0.2826,None,0.73,0.02,2.0406
5,3/31/2018,483000000.0,483000000.0,1.0,18200000000.0,5900000000.0,12900000000.0,2800000000.0,5270000000.0,14000000.0,55000000,5560000000.0,6310000000.0,3.45E+09,1.43E+08,1.42E+08,0.3,0.29,0.06,-4.40E+08,29000000,-5.40E+08,-9.50E+08,1210000000.0,117000000.0,26.87,31.17,22.57,-0.0536,-0.0134,10.8,2.67,None,33.12,None,1.2102,0.2861,None,0.7,-1.15,2.1039
6,12/31/2017,483000000.0,483000000.0,1.0,18700000000.0,6380000000.0,13800000000.0,2820000000.0,4910000000.0,14000000.0,55000000,7410000000.0,6810000000.0,3.27E+09,-7.30E+08,-7.30E+08,-1.51,-1.51,0.06,6.12E+08,-2.40E+08,-4.50E+07,3.35E+08,2150000000.0,236000000.0,25.3,27.85,22.74,-0.0232,-0.0038,10.06,2.07,None,33.06,None,1.4019,0.2594,None,0.67,0.78,2.2585
7,9/30/2017,481000000.0,481000000.0,1.0,19200000000.0,6150000000.0,13300000000.0,2680000000.0,5950000000.0,13000000.0,55000000,5250000000.0,6800000000.0,3.24E+09,1.19E+08,1.01E+08,0.23,0.22,0.06,1.72E+08,-1.30E+08,-1.50E+07,30000000,1820000000.0,131000000.0,24.76,26.84,22.67,-0.1222,-0.0308,12.24,1.92,None,33.0,None,1.1543,0.3063,None,0.65,0.09,2.2966
8,6/30/2017,441000000.0,441000000.0,1.0,19100000000.0,6030000000.0,13400000000.0,2660000000.0,5740000000.0,13000000.0,55000000,5220000000.0,6800000000.0,3.26E+09,2.12E+08,1.94E+08,0.44,0.43,0.06,2.17E+08,-1.30E+08,-8.60E+08,-7.70E+08,1790000000.0,125000000.0,25.2,28.65,21.75,-0.0899,-0.0231,12.89,2.05,None,32.94,None,1.1954,0.2976,None,0.61,0.21,2.2698
9,3/31/2017,441000000.0,441000000.0,1.0,20200000000.0,6710000000.0,14700000000.0,2590000000.0,5480000000.0,13000000.0,55000000,5170000000.0,8050000000.0,3.19E+09,3.22E+08,3.05E+08,0.69,0.65,0.06,-3.00E+08,1.03E+09,-4.30E+07,6.90E+08,2550000000.0,113000000.0,24.66,30.69,18.64,-0.0815,-0.0223,12.31,2.15,None,32.88,None,1.4826,0.2692,None,0.59,-0.94,2.5937
10,12/31/2016,441000000.0,441000000.0,1.0,20000000000.0,5890000000.0,14900000000.0,2750000000.0,5120000000.0,26000000.0,55000000,6940000000.0,8040000000.0,3.06E+09,-1.30E+09,-1.30E+09,-2.92,-2.92,7.76,6.62E+08,-2.40E+08,-4.00E+08,0,1860000000.0,302000000.0,24.43,32.1,16.75,-0.098,-0.029,11.49,0.91,None,32.82,None,1.5897,0.2525,None,0.57,0.82,2.1433
11,9/30/2016,438000000.0,438000000.0,1.0,37400000000.0,9370000000.0,23500000000.0,5500000000.0,11800000000.0,2170000000.0,55000000,5380000000.0,9500000000.0,5.21E+09,1.66E+08,1.48E+08,0.34,0.33,0.09,3.06E+08,-2.30E+08,-1.40E+08,-6.60E+07,1860000000.0,152000000.0,30,32.91,27.09,-0.0377,-0.0105,26.73,1.07,None,25.06,None,0.8107,0.313,None,0.57,0.35,1.7033
12,6/30/2016,1320000000.0,438000000.0,0.333333,36100000000.0,8090000000.0,21600000000.0,5490000000.0,12300000000.0,2190000000.0,55000000,5400000000.0,8280000000.0,5.30E+09,1.35E+08,1.18E+08,0.09,0.09,0.03,3.32E+08,3.11E+08,-1.00E+08,5.45E+08,1930000000.0,-50000000.0,30.42,34.5,26.34,-0.047,-0.0139,28.01,1.1,None,24.97,None,0.6741,0.3398,None,0.58,0.87,1.4747
13,3/31/2016,1320000000.0,438000000.0,0.333333,36100000000.0,7670000000.0,21800000000.0,5560000000.0,12200000000.0,2140000000.0,55000000,5400000000.0,8260000000.0,4.95E+09,16000000,-2000000,0,0,0.03,-4.30E+08,-1000000,-1.10E+08,-5.40E+08,1380000000.0,29000000.0,24.54,30.66,18.42,-0.0467,-0.0137,27.76,0.9,None,24.88,None,0.6784,0.3368,None,0.59,-1.05,1.3798
14,12/31/2015,1310000000.0,438000000.0,0.333333,36500000000.0,7950000000.0,22400000000.0,5210000000.0,12000000000.0,2090000000.0,55000000,7540000000.0,9040000000.0,5.25E+09,-7.00E+08,-7.20E+08,-0.55,-0.55,0.03,8.65E+08,-4.60E+08,-2.30E+08,1.80E+08,1920000000.0,398000000.0,28.48,33.54,23.43,-0.0324,-0.0089,27.36,0.99,25.66,24.79,None,0.7542,0.3283,None,0.62,1.07,1.5262

Upvotes: 1

Views: 63

Answers (2)

Alex Rajan Samuel
Alex Rajan Samuel

Reputation: 144

Can't think of an inbuilt way of doing this, but an alternative way is,

  1. expand your for loop and load the data frame to a variable
  2. create a column, df['fileName']=filename.split('.')[0], to get just the file name without the .csv.
  3. Then append this df to a list , this list will get appended every loop and after the loop completion just do a df.concat(list_csv, axis=0) to make one single df.

Replying from my phone so couldn't type in a working code, but it's easy if you think about it.

KR, Alex

Upvotes: 0

Adrian Keister
Adrian Keister

Reputation: 1025

You could try something like this, then:

df_list = []
for filename in all_filenames:
    df = pd.read_csv(filename)

    # Adds a column Ticker to the dataframe with the filename in the column.
    # The split function will work if no filename has more than one period.
    # Otherwise, you can use Python built-in function to trim off the extension.
    df['Ticker'] = filename.split('.')[0]
    df_list.append(df)

all_dfs = pd.concat(df_list, axis=0)

Upvotes: 1

Related Questions