Reputation: 69

Finding the position of words in a sentence and populate in a specified format in R

I am trying to get the position of word from a sentence by creating a new column(word name) and writing the position number corresponding to the sentence in new column created.

I can get the position of the words using 'regexpr' but i don`t know how to bring it in required format.

  Example:

     text <- c("Sam can often be found practicing his guitar in his bedroom.","When it's raining Sam will often stay home all day",
      "Sam broke his guitar")

     words <- c("Sam","guitar","raining")


  Expected output format:

  **text**                                                             **Sam**       **guitar**          **raining**

  Sam can often be found practicing his guitar in his bedroom.            1              39                 -1

  When it's raining Sam will often stay home all day                      19             -1                 11

  Sam broke his guitar                                                    1              15                 -1

I understand, if words are not found in a sentence, 'regexpr' will give as -1. Can anyone please help me to get the output in above desired format?

Thank You!!

Upvotes: 1

Answers (4)

Joyvalley

Reputation: 164

Hi i would try something like this: Hi i edited the code so that the output is the way you need it.

text <- ("Sam can often be found practicing his guitar in his bedroom. When it's raining Sam will often stay home all day. Sam broke his guitar")
words <- c("Sam","guitar","raining")

sentences <- strsplit(text,'\\. ')[[1]]
my_output <- data.frame(matrix(ncol=length(words),nrow=length(sentences)))
colnames(my_output) <- words
rownames(my_output) <- sentences
my_output

for(j in 1:nrow(my_output)){
for(i in 1:length(sentences)){
  appears <-which(strsplit(sentences[i], split=" ")[[1]] == words[j])
  if(length(appears)>0){
  my_output[i,j] <- appears
  }else{
  my_output[i,j] <- 'NA'  
  }
}
}
my_output

The ouput now looks like this:

                                                            Sam guitar raining
Sam can often be found practicing his guitar in his bedroom   1      8      NA
When it's raining Sam will often stay home all day            4     NA       3
Sam broke his guitar                                          1      4      NA

I hope thats what you wanted :-)

Upvotes: 2

Matt Jewett

Reputation: 3379

This might be a bit messy, but will also handle if the same word repeats more than once.

text <- c("Sam can often be found practicing his guitar in his bedroom.",
          "When it's raining Sam will often stay home all day",
          "Sam broke his guitar", 
          "Sam raining guitar Sam raining guitar")

words <- c("Sam","guitar","raining")

df <- data.frame(text, stringsAsFactors = FALSE)

for(i in 1:length(words))
{
  word.locations <- gsub(")","",gsub("c(","",unlist(paste(gregexpr(pattern = words[i], df$text))), fixed = TRUE), fixed = TRUE)
  df <- cbind(df,word.locations)
}

colnames(df) <- c("text", words)

Upvotes: 1

Bea

Reputation: 1110

Something like this?

res<- matrix(nrow=length(text),ncol=length(words))
rownames(res) <- text
colnames(res) <- words
for (i in 1:length(words)){
  res[,i]=regexpr(words[i],text)
}

res


                                                             Sam guitar raining
Sam can often be found practicing his guitar in his bedroom.   1     39      -1
When it's raining Sam will often stay home all day            19     -1      11
Sam broke his guitar                                           1     15      -1

Upvotes: 2

PKumar

Reputation: 11128

You can use sapply with gregexpr as below:

sapply(words,function(x)gregexpr(x,text))

Output:

     Sam guitar raining
[1,] 1   39     -1     
[2,] 19  -1     11     
[3,] 1   15     -1

One liner for data.frame conversion:

df<-data.frame(cbind(text=text ,setNames(sapply(words,function(x)gregexpr(x,text)),c("Sam","guitar","raining"))))

Output:

    #                                                            text Sam guitar raining
    # 1 Sam can often be found practicing his guitar in his bedroom.   1     39      -1
    # 2           When it's raining Sam will often stay home all day  19     -1      11
    # 3                                         Sam broke his guitar   1     15      -1

Upvotes: 3

Finding the position of words in a sentence and populate in a specified format in R

Answers (4)

Related Questions