gazel alaryan
gazel alaryan

Reputation: 11

How do I take a random of sentence in R and count the number of characters per word and sorts the text according to those numbers from words

First, i want to select 5 or 6 sentences randomly, and after that, write a function that finds the letter numbers of all words in a given text and sorts the text according to those numbers from words with few letters to words with many letters. Sort the words containing the same number of letters alphabetically.

[1] "We find joy in the simplest things. He wrote down a long list of items. The hail pattered on the burnt brown grass. Screen the porch with woven straw mats. The theft of the pearl pin was kept secret. Sweet words work better than fierce." 

the function should return the result like this

[1] "a he in of of on we joy pin the the the the the the was down find hail kept list long mats than with work brown burnt grass items pearl porch straw sweet theft words woven wrote better fierce screen secret things pattered simplest" 

Upvotes: 0

Views: 56

Answers (3)

Ric
Ric

Reputation: 5722

library(tokenizers)

text =  "We find joy in the simplest things. He wrote down a long list of items. The hail pattered on the burnt brown grass. Screen the porch with woven straw mats. The theft of the pearl pin was kept secret. Sweet words work better than fierce."

sort_count <- function(s){
  words <- tokenize_words(text, simplify = T)
  words[order(nchar(words), words)]
} 
 
sort_count(text)
#>  [1] "a"        "he"       "in"       "of"       "of"       "on"      
#>  [7] "we"       "joy"      "pin"      "the"      "the"      "the"     
#> [13] "the"      "the"      "the"      "was"      "down"     "find"    
#> [19] "hail"     "kept"     "list"     "long"     "mats"     "than"    
#> [25] "with"     "work"     "brown"    "burnt"    "grass"    "items"   
#> [31] "pearl"    "porch"    "straw"    "sweet"    "theft"    "words"   
#> [37] "woven"    "wrote"    "better"   "fierce"   "screen"   "secret"  
#> [43] "things"   "pattered" "simplest"

Upvotes: 2

I_O
I_O

Reputation: 6911

one approach with base R:

sentence <- 
"We find joy in the simplest things. He wrote down a long list of items. 
The hail pattered on the burnt brown grass. Screen the porch with woven straw mats.
The theft of the pearl pin was kept secret. Sweet words work better than fierce."

sentence |>
    strsplit('\\W+?') |> ## split at non-word characters
    unlist() |>
    (\(.) .[. != ""])() |> ## remove empty strings
    (\(.) .[order(nchar(.), .)])() |> ## sort by string length and alphabet
                            paste(collapse = ' ')

output:

[1] "a He in of of on We joy pin the the the the The The was down find hail kept list long mats than with work brown burnt grass items pearl porch straw Sweet theft words woven wrote better fierce Screen secret things pattered simplest"

Note that there's some perhaps unfamiliar notation like (\(.) ...)(). This is a shorthand for defining and executing an anonymous function:

  • function(x){...} can be written as \(x){...}
  • (\(x){...})() defines and executes the function, where x is the incoming value if you put this construct in a ... |> ... |> pipeline

Upvotes: 2

Allan Cameron
Allan Cameron

Reputation: 173928

A similar base R approach:

str <- "We find joy in the simplest things. He wrote down a long list of items.
        The hail pattered on the burnt brown grass. Screen the porch with woven
        straw mats. The theft of the pearl pin was kept secret. 
        Sweet words work better than fierce."

words <- strsplit(str, "[[:punct:]]?\\s+[[:punct:]]?")[[1]]

split(words, nchar(words)) |>
  lapply(sort) |>
  unlist() |>
  paste(collapse = " ")
  
#> [1] "a He in of of on We joy pin the the the the The The was down find hail
#> kept list long mats than with work brown burnt grass items pearl porch 
#> straw Sweet theft words woven wrote better Screen secret things fierce. 
#> pattered simplest"

Upvotes: 2

Related Questions