Reputation: 1735
I have to mutate the dataframe
and add column based on a certain word Health
in a column. This code runs fine when I run it in R
with dplyr
, but it doesn't run when I use sparklyr. This is the first time I'm using sparklyr
. How can I fix this?
bmk_tbl %>% add_column(healthcare = case_when(
grepl("Health", .$OrganizationType) ~ 1,
TRUE ~ 0), .after = "OrganizationType")
I get the following error, and I don't know how to fix it
Error in if (nrow(df) != nrow(.data)) { : missing value where TRUE/FALSE needed
I'm not sure what to try so I tried doing something like this:
bmk_tbl %>% add_column(healthcare = case_when(
(.$OrganizationType %in% c("Health") ~ 1),
TRUE ~ 0), .after = "OrganizationType")
but this won't work because there's no single word Health
in the database. It's always mixed with some other multiple words.
Upvotes: 0
Views: 1124
Reputation: 4621
You have two unrelated problems here:
Mutating primitives like add_column
are applicable only to data.frames
, and tbl_spark
is not a one. This accounts for the following error:
Error in if (nrow(df) != nrow(.data)) { : missing value where TRUE/FALSE needed
In fact you should also see accompanying warning on the first invocation
In addition: Warning message:
`.data` must be a data frame in `add_column()`.
The right function to use here is mutate
.
grepl
is not translated into SQL primitive. Instead you should use grepl
Combined
data <- copy_to(sc, iris, overwrite=TRUE)
data %>%
mutate(match = case_when(
Species %rlike% "tos" ~ 1,
TRUE ~ 0
))
or simply
data %>%
mutate(match = as.numeric(Species %rlike% "tos"))
Upvotes: 1