PushT
PushT

Reputation: 57

Merge one-column csv files into a single csv file

I have seen some answers to this type of question here, but not enough to really help me. I did a split on a 9- column .csv - file & wrote them to vectors for other work in c++. They are subsequently written back to a folder as Single- column .csv files that basically look similar to this:

date
20171012
20171011
20171010
20171009
20171006
20171005
20171004

Now I want to combine all those 9 simple csv files into 1 file again so that they stack up horizontally next to each other,like this in the new file:

date,value,etc...     
20171012,2501593,etc..
20171011,2176309,etc..
20171010,3484064,etc..
20171009,1785852,etc..
20171006,1785852,etc..
20171005,16476641,etc..
20171004,1235406,etc..

I hope this is easy enough to understand. My code below is as follows:

import csv
data = [] # Buffer list
files = ['./CalculatedOutput/quote_date.csv', './CalculatedOutput/paper.csv', './CalculatedOutput/exch.csv', './CalculatedOutput/open.csv', './CalculatedOutput/high.csv', './CalculatedOutput/low.csv', './CalculatedOutput/close.csv', './CalculatedOutput/volume.csv', './CalculatedOutput/value.csv']

for filename in files:
    with open(filename, 'r') as csvfile:
        stocks = csv.reader(csvfile)
        for row in stocks:
            new_row = [row[0]]
            data.append(new_row)
        with open("CalculatedOutput/Opera.csv", "w+") as to_file:
            writer = csv.writer(to_file , delimiter=",")
            for new_row in data:
                writer.writerow(new_row)

This code DOES move all the rows of the columns into 1 new file, but it just puts them underneath one another. How would I go about writing the columns next to each other,comma separated ? I have tried extensively with Pandas, numpy and the csv lib according to concat, merge and others, but i just can't find the right way. I don't think I'm that far off, but my python is not the best unfortunately!

Upvotes: 3

Views: 2019

Answers (3)

use the os to find all files in the a directory endingwith input* then using the column name in the csv automatic stack the data by row using pd.concat. assume an encoding of iso-8859-1 if not unicode.

  path= 'C:\\Users\\your_username\\stacked_csv' 

  # get all csv files in input directory
  csv_files = glob.glob(os.path.join(path, 'input*.csv'))

  # read all csv files
  df_list = []
  for csv_file in csv_files:
      df = pd.read_csv(csv_file,encoding='iso-8859-1')
      df_list.append(df.dropna())

  # stack all csv files
  df_stacked = pd.concat(df_list, axis=0)

  output_file=path+"\\result_all.csv"
  # write stacked csv file
  df_stacked.to_csv(output_file, index=False)

  print(pd.read_csv(output_file))

Upvotes: 0

Moses Koledoye
Moses Koledoye

Reputation: 78556

You can open all the files using a single context manager with contextlib.ExitStack (in Python 3) and then write to the output file after applying zip on the iterable of files:

import csv
from contextlib import ExitStack

outfile = "CalculatedOutput/Opera.csv"
with ExitStack() as stack, open(outfile, "w+") as to_file:
    # open all files
    fs = [stack.enter_context(open(fname)) for fname in files]
    fs = map(csv.reader, fs)
    # write all rows from all files
    csv.writer(to_file).writerows(zip(*fs))

Update:

If the file contains characters that cannot be decoded to UTF-8 (the default encoding for open), you can use intermediate surrogate characters when reading, which are replaced with their original form at write time:

with ExitStack() as stack, open(outfile, "w+", errors='surrogateescape') as to_file :
    fs = [stack.enter_context(open(fname, errors='surrogateescape')) for fname in files]
    ...

Upvotes: 3

Anton vBR
Anton vBR

Reputation: 18916

I read you tried pandas, what went wrong there?. With pandas we can simply use pd.concat([df1,df2....]). So let's read them in and bunch them together:

import pandas as pd

df = pd.concat((pd.read_csv(f) for f in files),axis=1) # axis1 for horizontal
df.to_csv("CalculatedOutput/Opera.csv",index=False)

Example:

Let's first create two imaginary files:

file1 = """date
20171012
20171011
20171010
20171009
20171006
20171005
20171004"""

file2 = """number
1
2
3
4
5
6
7"""

files = [io.StringIO(f) for f in [file1,file2]]

import pandas as pd

df = pd.concat([pd.read_csv(f) for f in files],axis=1)

print(df)

       date  number
0  20171012       1
1  20171011       2
2  20171010       3
3  20171009       4
4  20171006       5
5  20171005       6
6  20171004       7

Upvotes: 2

Related Questions