Reputation: 305
I have a dataframe with dates that will be changing as some column names. I want this added on to the end as a new column. I need a count of the number in each row that is NA
. I'd ideally like to do this in dplyr
, and it's originally coming from a SQL server so I'm using dbplyr
as the beginning of the dataframe. Reproduction of the DF is below.
repro <- structure(list(UPC_Sell_Unit_Desc = c("MARL GOLD KSB/PACK/000002820000384/1","MARL KSF NM BX KING/PACK/000002820000357/1", "MARL GOLD KSB/PACK/000002820000384/1","MARL KSF NM BX KING/PACK/000002820000357/1", "MARL GOLD KSB/PACK/000002820000384/1","MARL KSF NM BX KING/PACK/000002820000357/1", "MARL GOLD KSB/PACK/000002820000384/1","MARL KSF NM BX KING/PACK/000002820000357/1", "MARL GOLD KSB/PACK/000002820000384/1","MARL KSF NM BX KING/PACK/000002820000357/1"), `Region Manager_desc` = c("Debbie Russell","Debbie Russell", "Debbie Russell", "Debbie Russell", "Debbie Russell","Debbie Russell", "Debbie Russell", "Jim Fiene", "Lou Valdes","Lou Valdes"), Area_desc = c("Area 101", "Area 101", "Area 101","Area 101", "Area 101", "Area 101", "Area 101", "Area 209", "Area 301","Area 301"), Location_ID = c(2358L, 2358L, 2367L, 2367L, 2368L,2368L, 2369L, 5060L, 9002L, 9002L), `2021-05-09` = c(17, 8, 12,4, 5, 9, 8, 5, 7, 4), `2021-05-10` = c(14, 6, 15, 8, 3, 18, 4,7, 10, 15), `2021-05-11` = c(14, 6, 15, 1, NA, 5, 5, 8, 14, 3), `2021-05-12` = c(9, 3, 9, 4, 1, 12, 4, 7, 7, 11), `2021-05-13` = c(10,13, 16, 4, 2, 9, 7, 8, 12, 9), `2021-05-14` = c(26, 8, 16, 3,1, 10, 6, 4, 16, 13), `2021-05-15` = c(15, 8, 12, 3, 2, 3, 11,9, 11, 10)), row.names = c(NA, -10L), groups = structure(list(`Region Manager_desc` = c("Debbie Russell", "Debbie Russell","Debbie Russell", "Debbie Russell", "Debbie Russell", "Debbie Russell","Debbie Russell", "Jim Fiene", "Lou Valdes", "Lou Valdes"), Area_desc = c("Area 101", "Area 101", "Area 101", "Area 101","Area 101", "Area 101", "Area 101", "Area 209", "Area 301","Area 301"), Location_ID = c(2358L, 2358L, 2367L, 2367L,2368L, 2368L, 2369L, 5060L, 9002L, 9002L), UPC_Sell_Unit_Desc = c("MARL GOLD KSB/PACK/000002820000384/1","MARL KSF NM BX KING/PACK/000002820000357/1", "MARL GOLD KSB/PACK/000002820000384/1","MARL KSF NM BX KING/PACK/000002820000357/1", "MARL GOLD KSB/PACK/000002820000384/1","MARL KSF NM BX KING/PACK/000002820000357/1", "MARL GOLD KSB/PACK/000002820000384/1","MARL KSF NM BX KING/PACK/000002820000357/1", "MARL GOLD KSB/PACK/000002820000384/1","MARL KSF NM BX KING/PACK/000002820000357/1"), .rows = structure(list(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L), ptype = integer(0), class = c("vctrs_list_of","vctrs_vctr", "list"))), row.names = c(NA, -10L), class = c("tbl_df","tbl", "data.frame"), .drop = TRUE), class = c("grouped_df","tbl_df", "tbl", "data.frame"))
Upvotes: 0
Views: 114
Reputation: 18642
FYI, your reproducible example is a grouped dataframe, that's why I passed it to ungroup
first. If it weren't a grouped dataframe then you do not need that pipe.
library(dplyr)
repro %>%
ungroup() %>%
mutate(isna = rowSums(is.na(.)))
Or you can use rowwise
:
repro %>%
rowwise() %>%
mutate(isna = sum(is.na(c_across()))) %>%
ungroup()
Here ungroup
is because rowwise
outputs a type of grouped dataframe. The default for c_across
is across all columns (eg everything()
), but you can use other tidyselect options as an argument for a subset of columns.
rowSums
is faster, but rowwise
allows you to use functions for which there isn't a row-wise variant.
Upvotes: 2