user1828605
user1828605

Reputation: 1735

How to add column in dataframe using sparklyr?

I have to mutate the dataframe and add column based on a certain word Health in a column. This code runs fine when I run it in R with dplyr, but it doesn't run when I use sparklyr. This is the first time I'm using sparklyr. How can I fix this?

bmk_tbl %>% add_column(healthcare = case_when(
                                          grepl("Health", .$OrganizationType) ~ 1, 
                                          TRUE ~ 0), .after = "OrganizationType")

I get the following error, and I don't know how to fix it

Error in if (nrow(df) != nrow(.data)) { : missing value where TRUE/FALSE needed

I'm not sure what to try so I tried doing something like this:

bmk_tbl %>% add_column(healthcare = case_when(
                                          (.$OrganizationType %in% c("Health") ~ 1), 
                                          TRUE ~ 0), .after = "OrganizationType")

but this won't work because there's no single word Health in the database. It's always mixed with some other multiple words.

Upvotes: 0

Views: 1124

Answers (1)

10465355
10465355

Reputation: 4621

You have two unrelated problems here:

  • Mutating primitives like add_column are applicable only to data.frames, and tbl_spark is not a one. This accounts for the following error:

    Error in if (nrow(df) != nrow(.data)) { : missing value where TRUE/FALSE needed
    

    In fact you should also see accompanying warning on the first invocation

    In addition: Warning message:
    `.data` must be a data frame in `add_column()`.
    

    The right function to use here is mutate.

  • grepl is not translated into SQL primitive. Instead you should use grepl

Combined

data <- copy_to(sc, iris, overwrite=TRUE)

data %>% 
  mutate(match = case_when(
    Species %rlike% "tos" ~ 1,
    TRUE ~ 0
  ))

or simply

data %>%
    mutate(match = as.numeric(Species %rlike% "tos"))

Upvotes: 1

Related Questions