Reputation: 25
I'm trying to find the letter count of the longest word in each sentence. My code is here
new_data <- sample_n(data.frame(stringr::sentences), 30)
new_data
split_data <- data.frame(X = str_remove_all(new_data$stringr..sentences, "\\."))
split_data
split_data <- data.frame(X = str_remove_all(split_data$X, ","))
split_data
split_data <- strsplit(split_data$X," ")
split_data
longest = c()
i=0
while(i<30){
i = i + 1
longest[i] <- as.list(split_data)[[i]]
longest[i] <- tail(longest[i][order(nchar(longest[i]))], 1)
}
Upvotes: 2
Views: 120
Reputation: 389012
Base R using sapply
-
sapply(split_data, function(x)
c(word = x[which.max(nchar(x))], length = max(nchar(x)))) |>
t() |>
as.data.frame()
This returns -
# word length
#1 sausage 7
#2 public 6
#3 costly 6
#4 capture 7
#5 morning 7
#6 orders 6
#7 waiting 7
#8 better 6
#9 Light 5
#10 Canned 6
#11 perfect 7
#12 tropics 7
#13 Tuesday 7
#14 matters 7
#15 corner 6
#16 gently 6
#17 secret 6
#18 crowded 7
#19 level 5
#20 crooked 7
#21 abrupt 6
#22 little 6
#23 pockets 7
#24 through 7
#25 turkey 6
#26 filing 6
#27 tumbled 7
#28 finish 6
#29 drifts 6
#30 before 6
Upvotes: 2
Reputation: 5214
Assuming the only non-letter characters you need to clean from your corpus are .
and ,
you could use the following. You can also pull the actual words by subsetting each sentence with which.max()
. Here you have to be careful of ties though.
library(tidyverse)
set.seed(1)
corpus <- sample(stringr::sentences, 5)
corpus
#> [1] "No doubt about the way the wind blows."
#> [2] "Feel the heat of the weak dying flame."
#> [3] "Take shelter in this tent, but keep still."
#> [4] "The kite flew wildly in the high wind."
#> [5] "The barrel of beer was a brew of malt and hops."
# length of longest words
corpus %>%
str_remove_all("[.,]") %>%
str_split(" ") %>%
lapply(nchar) %>%
lapply(max) %>%
unlist()
#> [1] 5 5 7 6 6
# pull actual longest words
corpus %>%
str_remove_all("[.,]") %>%
str_split(" ") %>%
{map2({.},
{.} %>% lapply(nchar) %>%
lapply(which.max) %>%
unlist(),
`[`)} %>%
unlist()
#> [1] "doubt" "dying" "shelter" "wildly" "barrel"
Created on 2022-02-02 by the reprex package (v2.0.1)
Upvotes: 2