Vaibhav Singh
Vaibhav Singh

Reputation: 1209

Find lengthiest word in a column containing multiple words in R

I am working with airports dataset from nycflights13 package. I want to find the word with longest length from the second column i.e. name

I tried 2 ways

  1. using strsplit + boundary frunction from stringr on airports$name but somehow now able to get this done effectively.

  2. use word function but it only takes first word from the name

     library(tidyverse)
     library(nycflights13)
     airport <- nycflights13::airports
    
     strsplit(word(airport$name),boundary("word"))
    

Upvotes: 1

Views: 204

Answers (2)

Vaibhav Singh
Vaibhav Singh

Reputation: 1209

@ian-campbell brilliant answer is great, while browsing I came up with another simpler option to get the same results (incase anyone comes across this question later)

library(tidyverse)
library(nycflights13)
airport <- nycflights13::airports

airports %>%
separate_rows(name, sep = ' ') %>% 
mutate(len=nchar(name)) %>% 
select(name,len) %>% 
arrange(desc(len))

Another possible answer that I came up while learning from Ian's answer (Also, I am on fire, damn!)

airport %>%
  mutate(longest = map(strsplit(name," "),~ nchar(.x))) %>% 
  unnest(longest) %>% 
  arrange(desc(longest))

Upvotes: 0

Ian Campbell
Ian Campbell

Reputation: 24838

Here's an approach with purrr::map. First, split the name column by space. Then apply a custom function to the list that is created. We can use [ to subset the vector in each list element to be the longest word. We can determine the longest word by applying nchar to each element. which.max can tell us which one is longest.

The _char version of map will return a character vector.

library(tidyverse)
airport %>%
   mutate(longest = map_chr(strsplit(name," "),
                            ~ .x[which.max(nchar(.x))]),
          wordlength = nchar(longest)) %>%
   select(name,longest,wordlength)
## A tibble: 1,458 x 3
#   name                           longest      wordlength
#   <chr>                          <chr>             <int>
# 1 Lansdowne Airport              Lansdowne             9
# 2 Moton Field Municipal Airport  Municipal             9
# 3 Schaumburg Regional            Schaumburg           10
# 4 Randall Airport                Randall               7
# 5 Jekyll Island Airport          Airport               7
# 6 Elizabethton Municipal Airport Elizabethton         12
# 7 Williams County Airport        Williams              8
# 8 Finger Lakes Regional Airport  Regional              8
# 9 Shoestring Aviation Airfield   Shoestring           10
#10 Jefferson County Intl          Jefferson             9
## … with 1,448 more rows

Upvotes: 3

Related Questions