Adding a header row with values for each column to multiple CSV files

Question

I have multiple CSV files in one directory but with no headers. I'm looking for a robust way to add same headers to all files in my directory at once.

Sample.csv:

 John Doe    Guitar    4 units

Desired output after adding headers 'name', 'product', 'quantity':

 name       product    quantity 
John Doe    Guitar     4 units

so far I found a way to add headers into a single file with pandas:

from pandas import read_csv      
df = read_csv('/path/to/my/file/Sample.csv')
df.columns = ['name', 'product', 'quantity']
df.to_csv('/path/to/my/file/output.csv')

now I guess I would have to add a loop that would read all files in my directory and add desired header row into each. Could someone help me with this step or suggest some other easier approach if possible? Thank you in advance.

attempting to add loop but it throws an error message:

import pandas as pd 
import os
import glob
from pandas import read_csv 
path = '/path/to/my/files/'
filelist = glob.glob(path + "/*.csv")
frame = pd.DataFrame()
list = []
frame = pd.DataFrame()
#whenever i run the below line it throws this error ->   IndentationError: expected an indented block
for file in filelist:
    df2 = pd.read_csv(path+file)
    df2.columns = ['name', 'product', 'qunatity']
    list.append(df2)
frame = pd.concat(list)

jawsem · Accepted Answer

Read_csv has a names parameter that you can use for columns.

If you want to add the same header into every csv you read. You can just pass the columns into the names parameter when you read the .csv files.


df = pd.read_csv('test_.csv', names = ['name', 'product', 'quantity'])

Editing your code. You are doing too much here you don't need to create a dataframe in the beginning. Also do not call your list "list" list is a special word in python.

You also do not need to add the path to the file, your glob list will already have the full path you need.

In regards to the indentation error. I would make sure you are using consistent indentations, sometimes that happens if you use spaces to indent for one line and a tab for another. I would simply delete the indentation and add it back the same way.

import pandas as pd 
import os
import glob
from pandas import read_csv 
path = '/path/to/my/files/'
filelist = glob.glob(path + "/*.csv")
df_list = []
for file in filelist:
# you also dont need to add path, the glob should already have the full path
    df2 = read_csv(file,names=['name', 'product', 'quantity'])
    ## save out files
    df2.to_csv(file,index=False)
    df_list.append(df2)
frame = pd.concat(df_list)
frame = pd.concat(df_list)

Also there is an even easier way to to this with list comprehension. See below.

import pandas as pd 
import os
import glob
path = '/path/to/my/files/'
filelist = glob.glob(path + "/*.csv")
frame = pd.concat([pd.read_csv(file,names=['name', 'product', 'quantity']) for file in filelist])

Adding a header row with values for each column to multiple CSV files

Answers (1)

Related Questions