NovaEthos
NovaEthos

Reputation: 500

Function is returning all values instead of one when mapped

Hoping to get some advice on how to map a function that returns a vector of readability scores for a sentence (eventually going to bind them all). I've tried two different ways, but have only figured out how to get it with a for loop so far.

library(quanteda.textstats)
haiku_df <- data.frame(id = c(1,2,3), 
                       sentences = c("Mapping a function",
                                     "Can sometimes leave me with more",
                                     "Questions than answers"))

I thought this would return one vector of scores per list, but instead it repeats it nrow(haiku_df) times:

scores <- function(text,id){
  flesch_score <- textstat_readability(text, measure = "Flesch")$Flesch
  fog_score <- textstat_readability(text, measure = "FOG")$FOG
  row <- data.frame(id, flesch_score, fog_score)
  row
}
score_df <- list()
score_df <- lapply(haiku_df$sentences, scores, haiku_df$id)
score_df


> score_df
[[1]]
  id flesch_score fog_score
1  1        62.79       1.2
2  2        62.79       1.2
3  3        62.79       1.2

[[2]]
  id flesch_score fog_score
1  1      102.045       2.4
2  2      102.045       2.4
3  3      102.045       2.4

[[3]]
  id flesch_score fog_score
1  1        62.79       1.2
2  2        62.79       1.2
3  3        62.79       1.2

This is in the right direction but still incorrect (adding n as an argument):

scores2 <- function(text,id,n){
  flesch_score <- textstat_readability(text[n], measure = "Flesch")$Flesch
  fog_score <- textstat_readability(text[n], measure = "FOG")$FOG
  row <- data.frame(id[n], flesch_score, fog_score)
  row
}
score2_df <- list()
score2_df <- lapply(haiku_df$sentences, scores2, haiku_df$id, n = 1:nrow(haiku_df))


> score2_df
[[1]]
  id.n. flesch_score fleschkincaid_score
1     1        62.79            5.246667
2     2           NA                  NA
3     3           NA                  NA

[[2]]
  id.n. flesch_score fleschkincaid_score
1     1      102.045           0.5166667
2     2           NA                  NA
3     3           NA                  NA

[[3]]
  id.n. flesch_score fleschkincaid_score
1     1        62.79            5.246667
2     2           NA                  NA
3     3           NA                  NA

The trusty for loop gets me what I want, but obviously is slower when scaled up.

score3_df <- list()
for (i in 1:nrow(haiku_df)){
  score3_df[[i]] <- scores(haiku_df$sentences[i],haiku_df$id[i])
}

> dplyr::bind_rows(score3_df)
  id flesch_score fog_score
1  1       62.790       1.2
2  2      102.045       2.4
3  3       62.790       1.2

Feel like I'm overlooking something really simple, but can't seem to figure it out. Thanks!

Upvotes: 2

Views: 56

Answers (1)

akrun
akrun

Reputation: 887038

We need Map instead of lapply as the function input is each element of 'sentences' for the corresponding 'id'

do.call(rbind, Map(scores, haiku_df$sentences, haiku_df$id))
#.                                 id flesch_score fog_score
#Mapping a function                1       62.790       1.2
#Can sometimes leave me with more  2      102.045       2.4
#Questions than answers            3       62.790       1.2

It can be also written as

do.call(rbind, do.call(Map, c(f = scores, unname(haiku_df[2:1]))))

Or using tidyverse

library(dplyr)
library(purrr)
library(tidyr)
haiku_df %>%
      transmute(out = map2(sentences, id, scores)) %>%
      unnest(out)

-output

# A tibble: 3 x 3
#     id flesch_score fog_score
#  <dbl>        <dbl>     <dbl>
#1     1         62.8       1.2
#2     2        102.        2.4
#3     3         62.8       1.2

Or using rowwise

haiku_df %>% 
    rowwise %>%
    transmute( scores(sentences, id)) %>%
    ungroup
# A tibble: 3 x 3
#     id flesch_score fog_score
#  <dbl>        <dbl>     <dbl>
#1     1         62.8       1.2
#2     2        102.        2.4
#3     3         62.8       1.2

Upvotes: 1

Related Questions