Reputation: 55
example sample data
Si K Ca Ba Fe Type
71.78 0.06 8.75 0 0 1
72.73 0.48 7.83 0 0 1
72.99 0.39 7.78 0 0 1
72.61 0.57 na 0 0 na
73.08 0.55 8.07 0 0 1
72.97 0.64 8.07 0 na 1
73.09 na 8.17 0 0 1
73.24 0.57 8.24 0 0 1
72.08 0.56 8.3 0 0 1
72.99 0.57 8.4 0 0.11 1
na 0.67 8.09 0 0.24 1
we can load data into sparklyr
with the following code
sdf_copy_to(sc,sampledata)
I am looking for a query that returns the columns having NA values for example like
si k ca fe
1 1 1 2
Upvotes: 4
Views: 1369
Reputation: 330353
This problem is actually a bit tricky due to tbl_spark
implementation and incompatibilities in Spark and R semantics. Even if could apply colSums
, Spark SQL doesn't allow implicit conversions between booleans and numerics. This means you have to explicitly apply as.numeric
:
library(dplyr)
sampledata <- copy_to(sc, data.frame(x=c(1, NA, 2), y=c(NA, 2, NA), z=42))
sampledata %>%
mutate_all(is.na) %>%
mutate_all(as.numeric) %>%
summarize_all(sum)
# Source: lazy query [?? x 3]
# Database: spark_connection
x y z
<dbl> <dbl> <dbl>
1 1 2 0
Upvotes: 1