Reputation: 500
Hoping to get some advice on how to map a function that returns a vector of readability scores for a sentence (eventually going to bind them all). I've tried two different ways, but have only figured out how to get it with a for
loop so far.
library(quanteda.textstats)
haiku_df <- data.frame(id = c(1,2,3),
sentences = c("Mapping a function",
"Can sometimes leave me with more",
"Questions than answers"))
I thought this would return one vector of scores per list, but instead it repeats it nrow(haiku_df)
times:
scores <- function(text,id){
flesch_score <- textstat_readability(text, measure = "Flesch")$Flesch
fog_score <- textstat_readability(text, measure = "FOG")$FOG
row <- data.frame(id, flesch_score, fog_score)
row
}
score_df <- list()
score_df <- lapply(haiku_df$sentences, scores, haiku_df$id)
score_df
> score_df
[[1]]
id flesch_score fog_score
1 1 62.79 1.2
2 2 62.79 1.2
3 3 62.79 1.2
[[2]]
id flesch_score fog_score
1 1 102.045 2.4
2 2 102.045 2.4
3 3 102.045 2.4
[[3]]
id flesch_score fog_score
1 1 62.79 1.2
2 2 62.79 1.2
3 3 62.79 1.2
This is in the right direction but still incorrect (adding n
as an argument):
scores2 <- function(text,id,n){
flesch_score <- textstat_readability(text[n], measure = "Flesch")$Flesch
fog_score <- textstat_readability(text[n], measure = "FOG")$FOG
row <- data.frame(id[n], flesch_score, fog_score)
row
}
score2_df <- list()
score2_df <- lapply(haiku_df$sentences, scores2, haiku_df$id, n = 1:nrow(haiku_df))
> score2_df
[[1]]
id.n. flesch_score fleschkincaid_score
1 1 62.79 5.246667
2 2 NA NA
3 3 NA NA
[[2]]
id.n. flesch_score fleschkincaid_score
1 1 102.045 0.5166667
2 2 NA NA
3 3 NA NA
[[3]]
id.n. flesch_score fleschkincaid_score
1 1 62.79 5.246667
2 2 NA NA
3 3 NA NA
The trusty for
loop gets me what I want, but obviously is slower when scaled up.
score3_df <- list()
for (i in 1:nrow(haiku_df)){
score3_df[[i]] <- scores(haiku_df$sentences[i],haiku_df$id[i])
}
> dplyr::bind_rows(score3_df)
id flesch_score fog_score
1 1 62.790 1.2
2 2 102.045 2.4
3 3 62.790 1.2
Feel like I'm overlooking something really simple, but can't seem to figure it out. Thanks!
Upvotes: 2
Views: 56
Reputation: 887038
We need Map
instead of lapply
as the function input is each element of 'sentences' for the corresponding 'id'
do.call(rbind, Map(scores, haiku_df$sentences, haiku_df$id))
#. id flesch_score fog_score
#Mapping a function 1 62.790 1.2
#Can sometimes leave me with more 2 102.045 2.4
#Questions than answers 3 62.790 1.2
It can be also written as
do.call(rbind, do.call(Map, c(f = scores, unname(haiku_df[2:1]))))
Or using tidyverse
library(dplyr)
library(purrr)
library(tidyr)
haiku_df %>%
transmute(out = map2(sentences, id, scores)) %>%
unnest(out)
-output
# A tibble: 3 x 3
# id flesch_score fog_score
# <dbl> <dbl> <dbl>
#1 1 62.8 1.2
#2 2 102. 2.4
#3 3 62.8 1.2
Or using rowwise
haiku_df %>%
rowwise %>%
transmute( scores(sentences, id)) %>%
ungroup
# A tibble: 3 x 3
# id flesch_score fog_score
# <dbl> <dbl> <dbl>
#1 1 62.8 1.2
#2 2 102. 2.4
#3 3 62.8 1.2
Upvotes: 1