bariskau
bariskau

Reputation: 327

How can I merge multiple csv files with Python as I want?

I have a few csv files for my homework. I want to combine them as in the example below. But I didn't know how to do it.

Exp1.csv

"DATE","NOW","OPEN","HIGH","LOW","Hac.","VOL %"
"01.09.2019","23,78","25,54","25,54","23,78","-","-7,04%"
"25.08.2019","25,58","23,96","26,00","23,56","2,14M","4,07%"

Exp2.csv

"DATE","NOW","OPEN","HIGH","LOW","Hac.","VOL %"
"01.09.2019","4,16","4,15","4,23","4,12","-","0,73%"
"25.08.2019","4,13","4,05","4,19","4,03","6,48M","1,98%"

I want to merge 2 files like this. I just want to get the VOL% column.


"DATE","Exp1","Exp2"
"01.09.2019","-7,04%","0,73%"
"25.08.2019","4,07%","1,98%"

Thank you everyone:) I found a solution like this and applied it.

import glob
import os
import pandas.io

path =r'/Users/baris/Documents/Files/'
all_files = glob.glob(os.path.join(path, "*.csv"))
df_from_each_file = (pandas.read_csv(f) for f in all_files)
concatenated_df = pandas.concat(df_from_each_file, axis=1)
concatenated_df_clean = (concatenated_df.drop('DATE',1).drop('NOW',1).drop('OPEN',1).drop('HIGH.',1).drop('Low',1).drop('Hac.',1)

df_dates_file = pandas.read_csv('/Users/baris/Documents/Files/Exp1.csv')
df_date_export = concatenated_df.iloc[:, 0]

final_result = pandas.concat([df_date_export,concatenated_df_clean], axis=1)
print(final_result)


Upvotes: 0

Views: 461

Answers (4)

Suriem
Suriem

Reputation: 129

Use the csv module.

https://docs.python.org/3/library/csv.html

read this tutorial:

https://realpython.com/python-csv/

something like this will do the work: (educational code)

import io
import csv

target = {}

file_one_string =\
""""DATE","NOW","OPEN","HIGH","LOW","Hac.","VOL %"
"01.09.2019","23,78","25,54","25,54","23,78","-","-7,04%"
"25.08.2019","25,58","23,96","26,00","23,56","2,14M","4,07%"
"""
file_two_string = \
""""DATE","NOW","OPEN","HIGH","LOW","Hac.","VOL %"
"01.09.2019","4,16","4,15","4,23","4,12","-","0,73%"
"25.08.2019","4,13","4,05","4,19","4,03","6,48M","1,98%"
"""


with io.StringIO(file_one_string) as file_one:
    csv_reader = csv.DictReader(file_one,delimiter=',',quotechar='"')
    for row in csv_reader:
        if 'VOL %' in row:
            target[row['DATE']] ={'Exp1': row['VOL %']}

with io.StringIO(file_two_string) as file_two:
    csv_reader = csv.DictReader(file_two,dialect="excel")
    for row in csv_reader:
        if row['DATE'] in target:
            target[row['DATE']]['Exp2'] = row['VOL %']
        else:
            print('Missing DATE {} in file_two'.format(row['DATE']))
    lines2 = [row for row in csv_reader]


with io.StringIO() as output_file:
    fieldnames = ['DATE', 'Exp1', 'Exp2']
    csv_writer = csv.DictWriter(output_file, fieldnames=fieldnames)
    csv_writer.writeheader()
    for key, value in target.items():
        csv_writer.writerow({
            'DATE': key,
            'Exp1': value['Exp1'],
            'Exp2': value['Exp2']
        })

    print(output_file.getvalue())

Upvotes: -1

Shabbir
Shabbir

Reputation: 153

You can use the pandas package to read and save csv. However you cannot remove columns while merging the csv files, but you can save the columns that you need Have a look at my code below. Replace the csv filename and column name with yours.

import pandas as pd

# create list of files you want to merge
all_filenames = ['test.csv','test1.csv']

# use pandas concat function to merge csv's
combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames ])

# export the csv
combined_csv.to_csv( "combined_csv.csv", index=False, encoding='utf-8-sig',columns=['test1'])

Upvotes: 1

user96564
user96564

Reputation: 1607

try something like this:

df = pd.read_csv('Exp1.csv')

df1 = pd.read_csv('Exp2.csv')

df['DATE'] = pd.to_datetime(df['DATE'])
df1['DATE'] = pd.to_datetime(df['DATE'])

final_df = pd.merge(df[['DATE', 'VOL %']], df1[['DATE', 'VOL %']], on='DATE')

print(final_df)
      DATE VOL %_x VOL %_y
2019-01-09  -7,04%   0,73%
2019-08-25   4,07%   1,98%

Upvotes: 0

wcarhart
wcarhart

Reputation: 2773

import csv

with open('Exp1.csv', 'r') as f1:
    csv_reader = csv.reader(f1, delimiter=',')
    lines1 = [row for row in csv_reader]

with open('Exp2.csv', 'r') as f2:
    csv_reader = csv.reader(f2, delimiter=',')
    lines2 = [row for row in csv_reader]

del lines1[0]
del lines2[0]
with open('output.csv', 'w+') as output_file:
    output_file.write('"DATE","Exp1","Exp2"\n')
    for index, _ in enumerate(lines1):
        date = lines1[index][0]
        vol1 = lines1[index][6]
        vol2 = lines2[index][6]
        output_file.write(f'"{date}","{vol1}","{vol2}"\n')

This assumes the following:

  • VOL % is always going to be in the 7th column (like in your example)
  • DATE is always going to be in the 1st column (like in your example)
  • there will always be the same number of lines in Exp1.csv and Exp2.csv
  • the "DATE" will always be the same in Exp1.csv and Exp2.csv

Read more about the CSV module: https://docs.python.org/3/library/csv.html

Upvotes: 1

Related Questions