uday
uday

Reputation: 6713

convert_to_r_dataframe gives error no attribute dtype

I have a pandas dataframe of size 153895 rows x 644 columns (read from a csv file) and has a few columns that are string and others as integer and float. I am trying to save it as a Rda file.

I tried:

import pandas.rpy.common as com
myDFinR = com.convert_to_r_dataframe(myDF)

I get the following error:

Traceback (most recent call last):
  File "C:\PF\WinPython-64bit-3.3.3.3\python-3.3.3.amd64\lib\site-packages\IPython\core\interactiveshell.py", line 2828, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-101-7d2a8ae98ea4>", line 1, in <module>
dDataR=com.convert_to_r_dataframe(dData)
  File "C:\PF\WinPython-64bit-3.3.3.3\python-3.3.3.amd64\lib\site-packages\pandas\rpy\common.py", line 305, in convert_to_r_dataframe
value_type = value.dtype.type
  File "C:\PF\WinPython-64bit-3.3.3.3\python-3.3.3.amd64\lib\site-packages\pandas\core\generic.py", line 1815, in __getattr__
(type(self).__name__, name))
AttributeError: 'DataFrame' object has no attribute 'dtype'

I tried to do myDF.dtypes and it didn't give me anything unusual output

col1        object
col2        object
col3        int64
...
col642      float64
col643      float64
col644      float64
Length: 644, dtype: object

When I tried for i,j in enumerate(myDF.columns): print(i,":",myDF[j].dtype) then it gave me an error at column 359. However, if I try myDF[[359]].dtypes it gives me

col359      float64
dtype: object

What could be the issue?

Upvotes: 1

Views: 691

Answers (1)

unutbu
unutbu

Reputation: 879481

I can reproduce the error messages when myDF has non-unique column names:

import pandas as pd
import pandas.rpy.common as com

myDF = pd.DataFrame([[1,2],[3,4]], columns=['A','B'])
myDFinR = com.convert_to_r_dataframe(myDF)
print(myDFinR)   # 1

myDF2 = pd.DataFrame([[1,2],[3,4]], columns=['A','A'])
myDFinR2 = com.convert_to_r_dataframe(myDF2)
print(myDFinR2)  # 2
  1. Prints

      A B
    0 1 2
    1 3 4
    
  2. Raises AttributeError:

    AttributeError: 'DataFrame' object has no attribute 'dtype'
    

If this is indeed the source of your problem, you can fix it by renaming the columns to something unique:

myDF.columns = ['col{i}'.format(i=i) for i in range(len(myDF.columns))]

Upvotes: 1

Related Questions