Reputation: 363
I am trying merge multiple files based on a key ('r_id') and rename the column names in the output with the name of the files. I could able to do every thing except renaming the output with the file names. I have the following error probably caused by the old version of Pandas. Does any one know how to fix this with out updating pandas to new version?
Error
Traceback (most recent call last):
File "multijoin_2.py", line 19, in <module>
result = merge_files(files).reset_index()
File "multijoin_2.py", line 11, in merge_files
pd.read_csv(f, sep='\t', usecols=['r_id', 'exp'])
File "/users/xxx/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 2007, in __getattr__
(type(self).__name__, name))
AttributeError: 'DataFrame' object has no attribute 'assign'
Input
$ cat test1
r_id g_id exp
r1 g1 20
r2 g1 30
r3 g1 1
r4 g1 3
$ cat test2
r_id gid exp
r1 g2 20
r2 g2 30
r3 g2 1
r4 g2 3
$ cat test3
r_id g_id exp
r1 g3 30
r2 g3 40
r3 g3 11
r4 g3 32
Desired Ouput
r_id test3 test2 test1
0 r1 30 20 20
1 r2 40 30 30
2 r3 11 1 1
3 r4 32 3 3
Working code (except column naming)
import os
import glob
import pandas as pd
files = glob.glob(r'/path/test*')
def merge_files(files, **kwargs):
dfs = []
for f in files:
dfs.append(
pd.read_csv(f, sep='\t', usecols=['r_id', 'exp'])
#.assign(col=0)
.rename(columns={'col_name':os.path.splitext(os.path.basename(f))[0]})
.set_index(['repeat_id'])
)
return pd.concat(dfs, axis=1)
result = merge_files(files).reset_index()
print(result)
Upvotes: 2
Views: 24404
Reputation: 862691
You need change exp
as column name for rename:
def merge_files(files, **kwargs):
dfs = []
for f in files:
dfs.append(
pd.read_csv(f, sep='\t', usecols=['r_id', 'exp'], index_col=['r_id'])
.rename(columns={'exp':os.path.splitext(os.path.basename(f))[0]})
)
return pd.concat(dfs, axis=1)
result = merge_files(files).reset_index()
print(result)
r_id test1 test2 test3
0 r1 20 20 30
1 r2 30 30 40
2 r3 1 1 11
3 r4 3 3 32
Upvotes: 2