Reputation: 715
I have multiple CSV files in one directory but with no headers. I'm looking for a robust way to add same headers to all files in my directory at once.
Sample.csv:
John Doe Guitar 4 units
Desired output after adding headers 'name', 'product', 'quantity':
name product quantity
John Doe Guitar 4 units
so far I found a way to add headers into a single file with pandas:
from pandas import read_csv
df = read_csv('/path/to/my/file/Sample.csv')
df.columns = ['name', 'product', 'quantity']
df.to_csv('/path/to/my/file/output.csv')
now I guess I would have to add a loop that would read all files in my directory and add desired header row into each. Could someone help me with this step or suggest some other easier approach if possible? Thank you in advance.
attempting to add loop but it throws an error message:
import pandas as pd
import os
import glob
from pandas import read_csv
path = '/path/to/my/files/'
filelist = glob.glob(path + "/*.csv")
frame = pd.DataFrame()
list = []
frame = pd.DataFrame()
#whenever i run the below line it throws this error -> IndentationError: expected an indented block
for file in filelist:
df2 = pd.read_csv(path+file)
df2.columns = ['name', 'product', 'qunatity']
list.append(df2)
frame = pd.concat(list)
Upvotes: 0
Views: 2290
Reputation: 771
Read_csv has a names parameter that you can use for columns.
If you want to add the same header into every csv you read. You can just pass the columns into the names parameter when you read the .csv files.
df = pd.read_csv('test_.csv', names = ['name', 'product', 'quantity'])
Editing your code. You are doing too much here you don't need to create a dataframe in the beginning. Also do not call your list "list" list is a special word in python.
You also do not need to add the path to the file, your glob list will already have the full path you need.
In regards to the indentation error. I would make sure you are using consistent indentations, sometimes that happens if you use spaces to indent for one line and a tab for another. I would simply delete the indentation and add it back the same way.
import pandas as pd
import os
import glob
from pandas import read_csv
path = '/path/to/my/files/'
filelist = glob.glob(path + "/*.csv")
df_list = []
for file in filelist:
# you also dont need to add path, the glob should already have the full path
df2 = read_csv(file,names=['name', 'product', 'quantity'])
## save out files
df2.to_csv(file,index=False)
df_list.append(df2)
frame = pd.concat(df_list)
frame = pd.concat(df_list)
Also there is an even easier way to to this with list comprehension. See below.
import pandas as pd
import os
import glob
path = '/path/to/my/files/'
filelist = glob.glob(path + "/*.csv")
frame = pd.concat([pd.read_csv(file,names=['name', 'product', 'quantity']) for file in filelist])
Upvotes: 1