Reputation: 680
I have several csv files that has same first row element in it. For example:
csv-1.csv:
Value,0
Currency,0
datetime,0
Receiver,0
Beneficiary,0
Flag,0
idx,0
csv-2.csv:
Value,0
Currency,1
datetime,0
Receiver,0
Beneficiary,0
Flag,0
idx,0
And with these files (more than 2 files by the way) I want to merge them and create something like that:
left | csv-1 | csv-2 |
---|---|---|
Value | 0 | 0 |
Currency | 0 | 1 |
datetime | 0 | 0 |
How can I create this funtion in python?
Upvotes: 1
Views: 163
Reputation: 3987
Here's what you can do
import pandas as pd
from glob import glob
def refineFilename(path):
return path.split(".")[0]
df=pd.DataFrame()
for file in glob("csv-*.csv"):
new=pd.read_csv(file,header=None,index_col=[0])
df[refineFinename(file)]=new[1]
df.reset_index(inplace=True)
df.rename(columns={0:"left"},inplace=True)
print(df)
"""
left csv-1 csv-2
0 Value 0 0
1 Currency 0 1
2 datetime 0 0
3 Receiver 0 0
4 Beneficiary 0 0
5 Flag 0 0
6 idx 0 0
"""
What we are doing here is making the df
variable store all data, and then iterating through all files and adding a second column of those files to df
with file name as the column name.
Upvotes: 1
Reputation: 120559
You can use:
import pandas as pd
import pathlib
out = (pd.concat([pd.read_csv(csvfile, header=None, index_col=[0], names=[csvfile.stem])
for csvfile in sorted(pathlib.Path.cwd().glob('*.csv'))], axis=1)
.rename_axis('left').reset_index())
Output:
>>> out
left csv-1 csv-2
0 Value 0 0
1 Currency 0 1
2 datetime 0 0
3 Receiver 0 0
4 Beneficiary 0 0
5 Flag 0 0
6 idx 0 0
Upvotes: 1
Reputation: 420
First, you must create indexes in dataframes by first columns, on which you will further join:
import pandas as pd
import numpy as np
df1 = pd.read_csv('csv-1.csv')
df2 = pd.read_csv('csv-2.csv')
df1 = df1.set_index('col1')
df2 = df2.set_index('col1')
df = df1.join(df2, how='outer')
Then rename the column names if needed, or make a new index
Upvotes: 2