Reputation: 347
I am trying to run the following code in an R data frame using Python.
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
import os
import pandas as pd
import timeit
from rpy2.robjects import r
from rpy2.robjects import pandas2ri
pandas2ri.activate()
start = timeit.default_timer()
def f(x):
return fuzz.partial_ratio(str(x["sig1"]),str(x["sig2"]))
def fu_match(file):
f1=r.load(file)
f1=pandas2ri.ri2py(f1)
f1["partial_ratio"]=f1.apply(f, axis=1)
f1=f1.loc[f1["partial_ratio"]>90]
f1.to_csv("test.csv")
stop = timeit.default_timer()
print stop - start
fu_match('test_full.RData')
Here is the error.
AttributeError: 'numpy.ndarray' object has no attribute 'apply'
I guess the problem has to do with the conversion from R to Pandas data frame. I know this is a repeated question, but I have tried all the solutions given to previous questions with no success.
Please, any help will be much appreciated.
EDIT: Here is the head of .RData.
city sig1 sig2
1 19 claudiopillonrobertoscolari almeidabartolomeufrancisco
2 19 claudiopillonrobertoscolari cruzricardosantasergiosilva
3 19 claudiopillonrobertoscolari costajorgesilva
4 19 claudiopillonrobertoscolari costafrancisconaifesilva
5 19 claudiopillonrobertoscolari camarajoseluizreis
6 19 claudiopillonrobertoscolari almeidafilhojoaopimentel
Upvotes: 2
Views: 8016
Reputation: 20341
This line
f1=pandas2ri.ri2py(f1)
is setting f1
to be a numpy.ndarray
when I think you expect it to be a pandas.DataFrame
.
You can cast the array into a DataFrame
with something like
f1 = pd.DataFrame(data=f1)
but you won't have your column names defined (which you use in f(x)
). What is the structure of test_full.RData
? Do you want to manually define your column names? If so
f1 = pd.DataFrame(data=f1, columns=("my", "column", "names"))
should do the trick.
BUT I would suggest you look at using a more standard data format, maybe .csv
. pandas
has good support for this, and I expect R
does too. Check out the docs.
Upvotes: 3