add-semi-colons
add-semi-colons

Reputation: 18800

Combine CSV file data to one CSV file

I have csv files spread around in multiple directories, each of the csv file has only one column containing data. What I want to do is read all these files and bring each file's column into on csv file. Final csv file will have columns with filename as its headers and respective data from its original file as its column data.

This is my directory structure inside ~/csv_files/ ls

ab   arc  bat-smg   bn       cdo  crh      diq  es   fo   gd   haw  ia   iu   ki   ksh  lez  lv   mo   na      no   os   pih  rmy   sah  simple  ss   tet  tr   ur   war  zea
ace  arz  bcl       bo       ce   cs       dsb  et   fr   gl   he   id   ja   kk   ku   lg   map-bms  mr   nah     nov  pa   pl   rn    sc   sk      st   tg   ts   uz   wo   zh
af   as

each directory has two csv files, I thought of using os.walk() function but I think my understanding of the os.walk is incorrect and thats why currently what I have doesn't produce anything.

import sys, os
import csv

root_path = os.path.expanduser(
    '~/data/missing_files')

def combine_csv_files(path):
    for root, dirs, files in os.walk(path):
        for dir in dirs:
            for name in files:
                if name.endswith(".csv"):
                    csv_path = os.path.expanduser(root_path + name)
                    if os.path.exists(csv_path):
                        try:
                            with open(csv_path, 'rb') as f:
                                t = f.read().splitlines()
                                print t
                        except IOError, e:
                            print e

def main():
    combine_csv_files(root_path)

if __name__=="__main__":
    main()

My questions are:

  1. What am I doing wrong here?
  2. Can I read a one csv column from another file and add that data as a column to another file because csv files are more row dependent and here there are no dependency between rows.

At the end i am trying to get csv file like this, (Here are the potential headers)

ab_csv_data_file1, ab_csv_data_file2, arc_csv_data_file1, arc_csv_data_file2

Upvotes: 0

Views: 684

Answers (2)

Jerry Meng
Jerry Meng

Reputation: 1506

I don't know whether I understand what you mean. Let's you have multiple folders, such as "ab", "arc" and so on. For each folder, it contains two CSV files.

If I am right, then you are not doing the right thing.

def combine_csv_files(path):
    for root, dirs, files in os.walk(path):
        for dir in dirs:
            for dirpath, sub_dirs, sub_files in os.walk('/'.join([path,dir])
                for name in sub_files:
                    if name.endswith(".csv"):
                        csv_path = os.path.expanduser(dirpath + name)
                        if os.path.exists(csv_path):
                            try:
                                with open(csv_path, 'rb') as f:
                                    t = f.read().splitlines()
                                    print t
                            except IOError, e:
                                print e

The above code should works, if I am right

Upvotes: 1

Sheng
Sheng

Reputation: 3555

You are incorrectly using os.walk()

def combine_csv_files(path):
    for root, dirs, files in os.walk(path):
        for name in files:
            if name.endswith(".csv"):
                csv_path = os.path.join(root, name)
                try:
                    with open(csv_path, 'rb') as f:
                        t = f.read().splitlines()
                        print t
                except IOError, e:
                    print e

The os.walk() function yields a 3-tuple (dirpath, dirnames, filenames). And the "dirpath" is the path of currently walking directory, the "dirnames" is a list of directories in "dirpath", the "filenames" is a list of files in "dirpath". "dirpath" might be the "path" here, and any subfolder of "path".

Upvotes: 2

Related Questions