Reputation: 18800
I have csv files spread around in multiple directories, each of the csv file has only one column containing data. What I want to do is read all these files and bring each file's column into on csv file. Final csv file will have columns with filename as its headers and respective data from its original file as its column data.
This is my directory structure inside ~/csv_files/ ls
ab arc bat-smg bn cdo crh diq es fo gd haw ia iu ki ksh lez lv mo na no os pih rmy sah simple ss tet tr ur war zea
ace arz bcl bo ce cs dsb et fr gl he id ja kk ku lg map-bms mr nah nov pa pl rn sc sk st tg ts uz wo zh
af as
each directory has two csv files, I thought of using os.walk() function but I think my understanding of the os.walk is incorrect and thats why currently what I have doesn't produce anything.
import sys, os
import csv
root_path = os.path.expanduser(
'~/data/missing_files')
def combine_csv_files(path):
for root, dirs, files in os.walk(path):
for dir in dirs:
for name in files:
if name.endswith(".csv"):
csv_path = os.path.expanduser(root_path + name)
if os.path.exists(csv_path):
try:
with open(csv_path, 'rb') as f:
t = f.read().splitlines()
print t
except IOError, e:
print e
def main():
combine_csv_files(root_path)
if __name__=="__main__":
main()
My questions are:
At the end i am trying to get csv file like this, (Here are the potential headers)
ab_csv_data_file1, ab_csv_data_file2, arc_csv_data_file1, arc_csv_data_file2
Upvotes: 0
Views: 684
Reputation: 1506
I don't know whether I understand what you mean. Let's you have multiple folders, such as "ab", "arc" and so on. For each folder, it contains two CSV files.
If I am right, then you are not doing the right thing.
def combine_csv_files(path):
for root, dirs, files in os.walk(path):
for dir in dirs:
for dirpath, sub_dirs, sub_files in os.walk('/'.join([path,dir])
for name in sub_files:
if name.endswith(".csv"):
csv_path = os.path.expanduser(dirpath + name)
if os.path.exists(csv_path):
try:
with open(csv_path, 'rb') as f:
t = f.read().splitlines()
print t
except IOError, e:
print e
The above code should works, if I am right
Upvotes: 1
Reputation: 3555
You are incorrectly using os.walk()
def combine_csv_files(path):
for root, dirs, files in os.walk(path):
for name in files:
if name.endswith(".csv"):
csv_path = os.path.join(root, name)
try:
with open(csv_path, 'rb') as f:
t = f.read().splitlines()
print t
except IOError, e:
print e
The os.walk() function yields a 3-tuple (dirpath, dirnames, filenames). And the "dirpath" is the path of currently walking directory, the "dirnames" is a list of directories in "dirpath", the "filenames" is a list of files in "dirpath". "dirpath" might be the "path" here, and any subfolder of "path".
Upvotes: 2