K-J
K-J

Reputation: 109

How do I take a list of words in R and count the number of characters per word and store the frequency of counts in an array?

I have a multi step problem I am struggling to solve as a new student of R

Step 1. I have a character vector with sentences that are delimited by the " character and I need to split those 4 sentences into a list. I seem to have done this part OK:

list <- strsplit(text, split = " ") 

Step 2. I now have my list of 4 and I need to iterate through the list and find find all of the words in 'sentence 1' with 1 character, and count them and output it into a specific element of an array I have created. I need to do the same for all words in that first sentence until there are none left and then move onto the next sentence in the list. I have created the array OK, it seems

array_output <- array(dim=c(9, 4, 1))

I also seem to have created a loop that goes through each sentence and counts the number of characters per word.

for(i in list[]){
  elements <- nchar(i)
  print(elements)
}

But I am stuck at trying to get each the relevant character count from the relevant sentence into the right spot in the array.

I am certain there is a much easier way to do this however I am well and truly stuck.

Here is the original input I am working with:

text <- c("Three blind mice", 
          "Three blind mice", 
          "See how they run see how they run",
          "They all ran after the farmers wife who cut off their heads with a carving knife")

Upvotes: 0

Views: 650

Answers (1)

Limey
Limey

Reputation: 12586

Here's a solution based on the tidyverse. Comments in the code explain each step.

library(tidyverse)

# Test data.  Add an extra column to uniquely identify original rows.  
# This is necessary because of the duplication in the first two rows.
df <- tibble(row=1:4,
             text=c("Three blind mice", 
          "Three blind mice", 
          "See how they run see how they run",
          "They all ran after the farmers wife who cut off their heads with a carving knife"))

df %>% 
  # Split the senetnce into words and turn the words into a list
  mutate(word=as.list(strsplit(text, " "))) %>% 
  # Turn the data set into long format, one row per word, not one row per sentence
  unnest(word) %>% 
  # Calculate the length of each word
  mutate(word_length=str_length(word)) %>% 
  # Group by word length within original row
  group_by(row, word_length) %>% 
  # Calculate frequencies
  summarise(count=n())

which gives

# A tibble: 11 x 3
# Groups:   row [4]
     row word_length count
   <int>       <int> <int>
 1     1           4     1
 2     1           5     2
 3     2           4     1
 4     2           5     2
 5     3           3     6
 6     3           4     2
 7     4           1     1
 8     4           3     6
 9     4           4     3
10     4           5     4
11     4           7     2

Generally speaking, if you're working in R and thinking "I need to use a loop", stop. There's probably a better way to do it.

Upvotes: 0

Related Questions