I'm trying to find the letter count of the longest word in each sentence in R

I'm trying to find the letter count of the longest word in each sentence. My code is here

new_data <- sample_n(data.frame(stringr::sentences), 30)
new_data

split_data <- data.frame(X = str_remove_all(new_data$stringr..sentences, "\\."))
split_data

split_data <- data.frame(X = str_remove_all(split_data$X, ","))
split_data

split_data <- strsplit(split_data$X," ")
split_data

longest = c()
i=0

while(i<30){
   i = i + 1
   longest[i] <- as.list(split_data)[[i]]
   longest[i] <- tail(longest[i][order(nchar(longest[i]))], 1)
}

Upvotes: 2

Views: 120

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 389012

Base R using sapply -

sapply(split_data, function(x) 
  c(word = x[which.max(nchar(x))], length = max(nchar(x)))) |>
  t() |>
  as.data.frame()

This returns -

#      word length
#1  sausage      7
#2   public      6
#3   costly      6
#4  capture      7
#5  morning      7
#6   orders      6
#7  waiting      7
#8   better      6
#9    Light      5
#10  Canned      6
#11 perfect      7
#12 tropics      7
#13 Tuesday      7
#14 matters      7
#15  corner      6
#16  gently      6
#17  secret      6
#18 crowded      7
#19   level      5
#20 crooked      7
#21  abrupt      6
#22  little      6
#23 pockets      7
#24 through      7
#25  turkey      6
#26  filing      6
#27 tumbled      7
#28  finish      6
#29  drifts      6
#30  before      6

Upvotes: 2

Dan Adams
Dan Adams

Reputation: 5214

Assuming the only non-letter characters you need to clean from your corpus are . and , you could use the following. You can also pull the actual words by subsetting each sentence with which.max(). Here you have to be careful of ties though.

library(tidyverse)

set.seed(1)
corpus <- sample(stringr::sentences, 5) 
corpus
#> [1] "No doubt about the way the wind blows."         
#> [2] "Feel the heat of the weak dying flame."         
#> [3] "Take shelter in this tent, but keep still."     
#> [4] "The kite flew wildly in the high wind."         
#> [5] "The barrel of beer was a brew of malt and hops."

# length of longest words
corpus %>% 
  str_remove_all("[.,]") %>% 
  str_split(" ") %>% 
  lapply(nchar) %>% 
  lapply(max) %>% 
  unlist()
#> [1] 5 5 7 6 6
  
# pull actual longest words
corpus %>% 
  str_remove_all("[.,]") %>% 
  str_split(" ") %>% 
  {map2({.}, 
       {.} %>% lapply(nchar) %>% 
         lapply(which.max) %>% 
         unlist(),
       `[`)} %>% 
  unlist()
#> [1] "doubt"   "dying"   "shelter" "wildly"  "barrel"

Created on 2022-02-02 by the reprex package (v2.0.1)

Upvotes: 2

Related Questions