user113156
user113156

Reputation: 7127

Number of texted processed = 1 when it should = 4 (function to process documents)

I have this data (the data is generated in R) and I use the reticulate package to port over to Python. The problem is with my Python code.

R code :

text <- c("Because I could not stop for Death -",
          "He kindly stopped for me -",
          "The Carriage held but just Ourselves -",
          "and Immortality")

ID <- c(1,2,3,4)    
df <- data.frame(cbind(ID, text))
library(reticulate)

df_py <- r_to_py(df)
repl_python()

Python code :

import gensim
LabeledSentence1 = gensim.models.doc2vec.TaggedDocument
all_content_data = []
j = 0
for em in r.df_py['text'].values:
  all_content_data.append(LabeledSentence1(em,[j]))
j+=1
print('Number of texts processed: ', j)

Note: The r.df_py['text'] is a "special" function which calls R data, it can be changed to df_py['text'] if just using Python.

The data is supposed to process the documents but when I print it says Number of texts processed: 1 when it should say Number of texts processed: 4. I just don't know where I am going wrong in that function. My data is a data frame and in each row I have a unique "book" all the text of that book is in one cell and I want to process that cell.

Upvotes: 0

Views: 44

Answers (2)

Shane
Shane

Reputation: 82

Your increment statement is simply not indented correctly therefore it is outside the loop Here is how it should be:

for em in r.df_py['text'].values:
   all_content_data.append(LabeledSentence1(em,[j]))
   j+=1

When first switching from Java to python I made that mistake a lot so don't feel alone :)

Upvotes: 1

FatihAkici
FatihAkici

Reputation: 5109

Your j += 1 is outside the loop, hence can't be incremented. Therefore it stays at 1. Put it inside the for-loop's indentation:

for em in r.df_py['text'].values:
    all_content_data.append(LabeledSentence1(em,[j]))
    j+=1

Upvotes: 3

Related Questions