Reputation: 6713
I have a pandas dataframe of size 153895 rows x 644 columns (read from a csv file) and has a few columns that are string and others as integer and float. I am trying to save it as a Rda file.
I tried:
import pandas.rpy.common as com
myDFinR = com.convert_to_r_dataframe(myDF)
I get the following error:
Traceback (most recent call last):
File "C:\PF\WinPython-64bit-3.3.3.3\python-3.3.3.amd64\lib\site-packages\IPython\core\interactiveshell.py", line 2828, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-101-7d2a8ae98ea4>", line 1, in <module>
dDataR=com.convert_to_r_dataframe(dData)
File "C:\PF\WinPython-64bit-3.3.3.3\python-3.3.3.amd64\lib\site-packages\pandas\rpy\common.py", line 305, in convert_to_r_dataframe
value_type = value.dtype.type
File "C:\PF\WinPython-64bit-3.3.3.3\python-3.3.3.amd64\lib\site-packages\pandas\core\generic.py", line 1815, in __getattr__
(type(self).__name__, name))
AttributeError: 'DataFrame' object has no attribute 'dtype'
I tried to do myDF.dtypes
and it didn't give me anything unusual output
col1 object
col2 object
col3 int64
...
col642 float64
col643 float64
col644 float64
Length: 644, dtype: object
When I tried for i,j in enumerate(myDF.columns): print(i,":",myDF[j].dtype)
then it gave me an error at column 359. However, if I try myDF[[359]].dtypes
it gives me
col359 float64
dtype: object
What could be the issue?
Upvotes: 1
Views: 691
Reputation: 879481
I can reproduce the error messages when myDF
has non-unique column names:
import pandas as pd
import pandas.rpy.common as com
myDF = pd.DataFrame([[1,2],[3,4]], columns=['A','B'])
myDFinR = com.convert_to_r_dataframe(myDF)
print(myDFinR) # 1
myDF2 = pd.DataFrame([[1,2],[3,4]], columns=['A','A'])
myDFinR2 = com.convert_to_r_dataframe(myDF2)
print(myDFinR2) # 2
Prints
A B
0 1 2
1 3 4
Raises AttributeError
:
AttributeError: 'DataFrame' object has no attribute 'dtype'
If this is indeed the source of your problem, you can fix it by renaming the columns to something unique:
myDF.columns = ['col{i}'.format(i=i) for i in range(len(myDF.columns))]
Upvotes: 1