edche
edche

Reputation: 680

How to merge multiple csv files?

I have several csv files that has same first row element in it. For example:

csv-1.csv:
Value,0
Currency,0
datetime,0
Receiver,0
Beneficiary,0
Flag,0
idx,0

csv-2.csv:
Value,0
Currency,1
datetime,0
Receiver,0
Beneficiary,0
Flag,0
idx,0

And with these files (more than 2 files by the way) I want to merge them and create something like that:

left csv-1 csv-2
Value 0 0
Currency 0 1
datetime 0 0

How can I create this funtion in python?

Upvotes: 1

Views: 163

Answers (3)

imxitiz
imxitiz

Reputation: 3987

Here's what you can do

import pandas as pd
from glob import glob

def refineFilename(path):
    return path.split(".")[0]

df=pd.DataFrame()

for file in glob("csv-*.csv"):
    new=pd.read_csv(file,header=None,index_col=[0])
    df[refineFinename(file)]=new[1]

df.reset_index(inplace=True)
df.rename(columns={0:"left"},inplace=True)

print(df)

"""
          left  csv-1  csv-2
0        Value      0      0
1     Currency      0      1
2     datetime      0      0
3     Receiver      0      0
4  Beneficiary      0      0
5         Flag      0      0
6          idx      0      0
"""

What we are doing here is making the df variable store all data, and then iterating through all files and adding a second column of those files to df with file name as the column name. 

Upvotes: 1

Corralien
Corralien

Reputation: 120559

You can use:

import pandas as pd
import pathlib

out = (pd.concat([pd.read_csv(csvfile, header=None, index_col=[0], names=[csvfile.stem])
           for csvfile in sorted(pathlib.Path.cwd().glob('*.csv'))], axis=1)
         .rename_axis('left').reset_index())

Output:

>>> out
          left  csv-1  csv-2
0        Value      0      0
1     Currency      0      1
2     datetime      0      0
3     Receiver      0      0
4  Beneficiary      0      0
5         Flag      0      0
6          idx      0      0

Upvotes: 1

angwrk
angwrk

Reputation: 420

First, you must create indexes in dataframes by first columns, on which you will further join:

import pandas as pd
import numpy as np

df1 = pd.read_csv('csv-1.csv')
df2 = pd.read_csv('csv-2.csv')

df1 = df1.set_index('col1')
df2 = df2.set_index('col1')

df = df1.join(df2, how='outer')

Then rename the column names if needed, or make a new index

Upvotes: 2

Related Questions