user1703276
user1703276

Reputation: 363

How to fix AttributeError: 'DataFrame' object has no attribute 'assign' with out updating Pandas?

I am trying merge multiple files based on a key ('r_id') and rename the column names in the output with the name of the files. I could able to do every thing except renaming the output with the file names. I have the following error probably caused by the old version of Pandas. Does any one know how to fix this with out updating pandas to new version?

Error

    Traceback (most recent call last):                                       
      File "multijoin_2.py", line 19, in <module>                            
        result = merge_files(files).reset_index()                            
      File "multijoin_2.py", line 11, in merge_files                         
        pd.read_csv(f, sep='\t', usecols=['r_id', 'exp'])          
      File "/users/xxx/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 2007, in __getattr__
        (type(self).__name__, name))                                                                             
AttributeError: 'DataFrame' object has no attribute 'assign'  

Input

$ cat test1

r_id       g_id exp
r1      g1      20
r2      g1      30
r3      g1      1
r4      g1      3

$ cat test2

r_id       gid exp
r1      g2      20
r2      g2      30
r3      g2      1
r4      g2      3

$ cat test3

r_id       g_id exp
r1      g3      30
r2      g3      40
r3      g3      11
r4      g3      32

Desired Ouput

  r_id  test3  test2  test1
0        r1        30        20        20
1        r2        40        30        30
2        r3        11         1         1
3        r4        32         3         3

Working code (except column naming)

import os
import glob
import pandas as pd

files = glob.glob(r'/path/test*')

def merge_files(files, **kwargs):
    dfs = []
    for f in files:
        dfs.append(
            pd.read_csv(f, sep='\t', usecols=['r_id', 'exp'])
              #.assign(col=0)
              .rename(columns={'col_name':os.path.splitext(os.path.basename(f))[0]})
              .set_index(['repeat_id'])
        )
    return pd.concat(dfs, axis=1)


result = merge_files(files).reset_index()
print(result)

Upvotes: 2

Views: 24404

Answers (1)

jezrael
jezrael

Reputation: 862691

You need change exp as column name for rename:

def merge_files(files, **kwargs):
    dfs = []
    for f in files:
        dfs.append(
            pd.read_csv(f, sep='\t', usecols=['r_id', 'exp'], index_col=['r_id'])
              .rename(columns={'exp':os.path.splitext(os.path.basename(f))[0]})
        )
    return pd.concat(dfs, axis=1)

result = merge_files(files).reset_index()
print(result)
  r_id  test1  test2  test3
0   r1     20     20     30
1   r2     30     30     40
2   r3      1      1     11
3   r4      3      3     32

Upvotes: 2

Related Questions