user8831872
user8831872

Reputation: 383

Count specific length of letter of the volume of words

I try to find in a text sentence words of more than 4 letters I tried this:

fullsetence <- as.character(c("A test setence with test length","A second test for length"))
nchar(fullsetence)

I expect to take as results, based for example in the previous example sentence/string one has 2 words with length greater than 4 letters and the second has 2 words.

Using nchar I take the full length of characters from the string.

What is the right way to make it?

Upvotes: 0

Views: 56

Answers (1)

AntoniosK
AntoniosK

Reputation: 16121

library(dplyr)
library(purrr)

# vector of sentences
fullsetence <- as.character(c("A test setence with test length","A second test for length"))

# get vector of counts for words with more than 4 letters
fullsetence %>%
  strsplit(" ") %>%
  map(~sum(nchar(.) > 4)) %>%
  unlist()

# [1] 2 2


# create a dataframe with sentence and the corresponding counts
# use previous code as a function within "mutate" 
data.frame(fullsetence, stringsAsFactors = F) %>%
  mutate(Counts = fullsetence %>%
                   strsplit(" ") %>%
                   map(~sum(nchar(.) > 4)) %>%
                   unlist() )

#                       fullsetence Counts
# 1 A test setence with test length      2
# 2        A second test for length      2

If you want to get the actual words with more than 4 letters you can use this in a similar way:

fullsetence %>%
  strsplit(" ") %>%
  map(~ .[nchar(.) > 4])

data.frame(fullsetence, stringsAsFactors = F) %>%
  mutate(Words = fullsetence %>%
                 strsplit(" ") %>%
                 map(~ .[nchar(.) > 4]))

Upvotes: 1

Related Questions