Reputation: 7127
I have this data (the data is generated in R) and I use the reticulate
package to port over to Python. The problem is with my Python code.
R code :
text <- c("Because I could not stop for Death -",
"He kindly stopped for me -",
"The Carriage held but just Ourselves -",
"and Immortality")
ID <- c(1,2,3,4)
df <- data.frame(cbind(ID, text))
library(reticulate)
df_py <- r_to_py(df)
repl_python()
Python code :
import gensim
LabeledSentence1 = gensim.models.doc2vec.TaggedDocument
all_content_data = []
j = 0
for em in r.df_py['text'].values:
all_content_data.append(LabeledSentence1(em,[j]))
j+=1
print('Number of texts processed: ', j)
Note: The r.df_py['text']
is a "special" function which calls R data, it can be changed to df_py['text']
if just using Python.
The data is supposed to process the documents but when I print it says Number of texts processed: 1
when it should say Number of texts processed: 4
. I just don't know where I am going wrong in that function. My data is a data frame and in each row I have a unique "book" all the text of that book is in one cell and I want to process that cell.
Upvotes: 0
Views: 44
Reputation: 82
Your increment statement is simply not indented correctly therefore it is outside the loop Here is how it should be:
for em in r.df_py['text'].values:
all_content_data.append(LabeledSentence1(em,[j]))
j+=1
When first switching from Java to python I made that mistake a lot so don't feel alone :)
Upvotes: 1
Reputation: 5109
Your j += 1
is outside the loop, hence can't be incremented. Therefore it stays at 1. Put it inside the for-loop's indentation:
for em in r.df_py['text'].values:
all_content_data.append(LabeledSentence1(em,[j]))
j+=1
Upvotes: 3