Reputation: 69
I am trying to get the position of word from a sentence by creating a new column(word name) and writing the position number corresponding to the sentence in new column created.
I can get the position of the words using 'regexpr' but i don`t know how to bring it in required format.
Example:
text <- c("Sam can often be found practicing his guitar in his bedroom.","When it's raining Sam will often stay home all day",
"Sam broke his guitar")
words <- c("Sam","guitar","raining")
Expected output format:
**text** **Sam** **guitar** **raining**
Sam can often be found practicing his guitar in his bedroom. 1 39 -1
When it's raining Sam will often stay home all day 19 -1 11
Sam broke his guitar 1 15 -1
I understand, if words are not found in a sentence, 'regexpr' will give as -1. Can anyone please help me to get the output in above desired format?
Thank You!!
Upvotes: 1
Views: 233
Reputation: 164
Hi i would try something like this: Hi i edited the code so that the output is the way you need it.
text <- ("Sam can often be found practicing his guitar in his bedroom. When it's raining Sam will often stay home all day. Sam broke his guitar")
words <- c("Sam","guitar","raining")
sentences <- strsplit(text,'\\. ')[[1]]
my_output <- data.frame(matrix(ncol=length(words),nrow=length(sentences)))
colnames(my_output) <- words
rownames(my_output) <- sentences
my_output
for(j in 1:nrow(my_output)){
for(i in 1:length(sentences)){
appears <-which(strsplit(sentences[i], split=" ")[[1]] == words[j])
if(length(appears)>0){
my_output[i,j] <- appears
}else{
my_output[i,j] <- 'NA'
}
}
}
my_output
The ouput now looks like this:
Sam guitar raining
Sam can often be found practicing his guitar in his bedroom 1 8 NA
When it's raining Sam will often stay home all day 4 NA 3
Sam broke his guitar 1 4 NA
I hope thats what you wanted :-)
Upvotes: 2
Reputation: 3379
This might be a bit messy, but will also handle if the same word repeats more than once.
text <- c("Sam can often be found practicing his guitar in his bedroom.",
"When it's raining Sam will often stay home all day",
"Sam broke his guitar",
"Sam raining guitar Sam raining guitar")
words <- c("Sam","guitar","raining")
df <- data.frame(text, stringsAsFactors = FALSE)
for(i in 1:length(words))
{
word.locations <- gsub(")","",gsub("c(","",unlist(paste(gregexpr(pattern = words[i], df$text))), fixed = TRUE), fixed = TRUE)
df <- cbind(df,word.locations)
}
colnames(df) <- c("text", words)
Upvotes: 1
Reputation: 1110
Something like this?
res<- matrix(nrow=length(text),ncol=length(words))
rownames(res) <- text
colnames(res) <- words
for (i in 1:length(words)){
res[,i]=regexpr(words[i],text)
}
res
Sam guitar raining
Sam can often be found practicing his guitar in his bedroom. 1 39 -1
When it's raining Sam will often stay home all day 19 -1 11
Sam broke his guitar 1 15 -1
Upvotes: 2
Reputation: 11128
You can use sapply
with gregexpr
as below:
sapply(words,function(x)gregexpr(x,text))
Output:
Sam guitar raining
[1,] 1 39 -1
[2,] 19 -1 11
[3,] 1 15 -1
One liner for data.frame conversion:
df<-data.frame(cbind(text=text ,setNames(sapply(words,function(x)gregexpr(x,text)),c("Sam","guitar","raining"))))
Output:
# text Sam guitar raining
# 1 Sam can often be found practicing his guitar in his bedroom. 1 39 -1
# 2 When it's raining Sam will often stay home all day 19 -1 11
# 3 Sam broke his guitar 1 15 -1
Upvotes: 3