Reputation: 109
I have a multi step problem I am struggling to solve as a new student of R
Step 1. I have a character vector with sentences that are delimited by the " character and I need to split those 4 sentences into a list. I seem to have done this part OK:
list <- strsplit(text, split = " ")
Step 2. I now have my list of 4 and I need to iterate through the list and find find all of the words in 'sentence 1' with 1 character, and count them and output it into a specific element of an array I have created. I need to do the same for all words in that first sentence until there are none left and then move onto the next sentence in the list. I have created the array OK, it seems
array_output <- array(dim=c(9, 4, 1))
I also seem to have created a loop that goes through each sentence and counts the number of characters per word.
for(i in list[]){
elements <- nchar(i)
print(elements)
}
But I am stuck at trying to get each the relevant character count from the relevant sentence into the right spot in the array.
I am certain there is a much easier way to do this however I am well and truly stuck.
Here is the original input I am working with:
text <- c("Three blind mice",
"Three blind mice",
"See how they run see how they run",
"They all ran after the farmers wife who cut off their heads with a carving knife")
Upvotes: 0
Views: 650
Reputation: 12586
Here's a solution based on the tidyverse. Comments in the code explain each step.
library(tidyverse)
# Test data. Add an extra column to uniquely identify original rows.
# This is necessary because of the duplication in the first two rows.
df <- tibble(row=1:4,
text=c("Three blind mice",
"Three blind mice",
"See how they run see how they run",
"They all ran after the farmers wife who cut off their heads with a carving knife"))
df %>%
# Split the senetnce into words and turn the words into a list
mutate(word=as.list(strsplit(text, " "))) %>%
# Turn the data set into long format, one row per word, not one row per sentence
unnest(word) %>%
# Calculate the length of each word
mutate(word_length=str_length(word)) %>%
# Group by word length within original row
group_by(row, word_length) %>%
# Calculate frequencies
summarise(count=n())
which gives
# A tibble: 11 x 3
# Groups: row [4]
row word_length count
<int> <int> <int>
1 1 4 1
2 1 5 2
3 2 4 1
4 2 5 2
5 3 3 6
6 3 4 2
7 4 1 1
8 4 3 6
9 4 4 3
10 4 5 4
11 4 7 2
Generally speaking, if you're working in R and thinking "I need to use a loop", stop. There's probably a better way to do it.
Upvotes: 0