Reputation: 383
I try to find in a text sentence words of more than 4 letters I tried this:
fullsetence <- as.character(c("A test setence with test length","A second test for length"))
nchar(fullsetence)
I expect to take as results, based for example in the previous example sentence/string one has 2 words with length greater than 4 letters and the second has 2 words.
Using nchar I take the full length of characters from the string.
What is the right way to make it?
Upvotes: 0
Views: 56
Reputation: 16121
library(dplyr)
library(purrr)
# vector of sentences
fullsetence <- as.character(c("A test setence with test length","A second test for length"))
# get vector of counts for words with more than 4 letters
fullsetence %>%
strsplit(" ") %>%
map(~sum(nchar(.) > 4)) %>%
unlist()
# [1] 2 2
# create a dataframe with sentence and the corresponding counts
# use previous code as a function within "mutate"
data.frame(fullsetence, stringsAsFactors = F) %>%
mutate(Counts = fullsetence %>%
strsplit(" ") %>%
map(~sum(nchar(.) > 4)) %>%
unlist() )
# fullsetence Counts
# 1 A test setence with test length 2
# 2 A second test for length 2
If you want to get the actual words with more than 4 letters you can use this in a similar way:
fullsetence %>%
strsplit(" ") %>%
map(~ .[nchar(.) > 4])
data.frame(fullsetence, stringsAsFactors = F) %>%
mutate(Words = fullsetence %>%
strsplit(" ") %>%
map(~ .[nchar(.) > 4]))
Upvotes: 1