kushaan gulati
kushaan gulati

Reputation: 38

Find the response with the most words

Trying to find the respondent in the list g.list with the most words in their response (g.list is a list of respondents with their IDs). However, g.list contains lists within it with the actual response which leads to the lapply(length) giving length 1. I'm struggling to deal with this. I'd ideally like to do this using the lapply() and strsplit() functions. Here's my code:

head(g.list)

Output:

$`1590553444`
[1] "Nothing"

$`1590610566`
[1] "Couldn't sit in a lot of them"

$`1590609253`
[1] "N/a"

Code:

g.split <- lapply(unlist(g.list), strsplit, " ")
head(g.split)

Output:

$`1590553444`
$`1590553444`[[1]]
[1] "Nothing"


$`1590610566`
$`1590610566`[[1]]
[1] "Couldn't" "sit"      "in"       "a"        "lot"      "of"       "them"    


$`1590609253`
$`1590609253`[[1]]
[1] "N/a"

Code:

 g.count <- lapply(unlist(g.split), length)
 head(g.count)

Output:

$`1590553444`
[1] 1

$`1590610566`
[1] 1

$`1590609253`
[1] 1

Code:

max(unlist(g.count))

I was expecting g.count <- lapply(unlist(g.split), length) to give the number of words. However, all of them are 1.

Upvotes: 1

Views: 70

Answers (2)

akrun
akrun

Reputation: 887213

If we have only one entry per list element, unlist, then strsplit the vector which returns a list of vectors and then use lengths

out <- lengths(strsplit(unlist(g.list), " "))
out
1590553444 1590610566 1590609253 
         1          7          1 

Then use which.max to get the index and extract the element with the max count

g.list[which.max(out)]
$`1590610566`
[1] "Couldn't sit in a lot of them"

Or another option is with str_count

library(stringr)
str_count(g.list, "\\S+")
[1] 1 7 1

data

g.list <- list(`1590553444` = "Nothing", 
  `1590610566` = "Couldn't sit in a lot of them", 
    `1590609253` = "N/a")

Upvotes: 2

SpikyClip
SpikyClip

Reputation: 162

The issue here was where to place the [[1]]. strsplit() returns a list, hence why length was returned as 1.

# create data
g.list = list(
    `1590553444` = "Nothing",
    `1590610566` = "Couldn't sit in a lot of them",
    `1590609253` = "N/a"
)

# solution
get_len = function(string) {
    length(strsplit(string, " ")[[1]])
}

lapply(g.list, get_len)

$`1590553444`
[1] 1

$`1590610566`
[1] 7

$`1590609253`
[1] 1

To get the max:

max(unlist(lengths))

[1] 7

Upvotes: 1

Related Questions