STATMATT
STATMATT

Reputation: 17

Is there a tidyverse Solution to changing the values of multiple columns to a specific value. R

Vaccination totals of the US from CDC using tidyverse/dplyr

Original dataset is pulled from https://covid.cdc.gov/covid-data-tracker/#vaccinations_vacc-total-admin-rate-total

I am down to the summary of the variables of interest. My desire here is to change the column's that contain 'PCT' to NA, as a sum of percent's is not meaningful.

# summarise_all(CDC_vacc_State, ~if(is.numeric(.)) sum(., na.rm = T) else "National")

Here I am going to output the dataset. Here we can see that there are 8 columns that contain 'PCT'. I do understand that my national population is off by 1.2M, I back-solved it from the data rather than merging in Census values.

# dput(summarise_all(CDC_vacc_State, ~if(is.numeric(.)) sum(., na.rm = T) else "National"))

df <- structure(
  list(
    STATE = "National",
    ONE_DOSE_NUM = 220978782,
    ONE_DOSE_PCT = 3752.9,
    FULL_DOSE_NUM = 190719566,
    FULL_DOSE_PCT = 3276.7,
    POP = 334203604,
    ONE_DOSE_NUM_18Plus = 206114881,
    ONE_DOSE_PCT_18Plus = 4412.7,
    FULL_DOSE_NUM_18Plus = 178587829,
    FULL_DOSE_PCT_18Plus = 3883.9,
    POP_18Plus = 258316189,
    ONE_DOSE_NUM_65Plus = 53312849,
    ONE_DOSE_PCT_65Plus = 5112.9,
    FULL_DOSE_NUM_65Plus = 47324330,
    FULL_DOSE_PCT_65Plus = 4652.6,
    POP_65Plus = 55128330,
    ONE_DOSE_NUM_12Plus = 219898471,
    ONE_DOSE_PCT_12Plus = 4258.4,
    FULL_DOSE_NUM_12Plus = 189835922,
    FULL_DOSE_PCT_12Plus = 3733.8,
    POP_12Plus = 282154872
  ),
  class = c("tbl_df", "tbl", "data.frame"),
  row.names = c(NA,-1L)
)

Change the values in columns containing 'PCT' to NA using tidyverse?

I have figured out a solution, but I was hoping to do this inline using dplyr and use a bind_rows() or rbindlist() with the original dataset.

# this is my current solution
df[,names(select(df, contains('PCT')))] <- NA  

Final dataset looks like

dput(df)

structure(list(STATE = "National", ONE_DOSE_NUM = 220978782, 
    ONE_DOSE_PCT = NA, FULL_DOSE_NUM = 190719566, FULL_DOSE_PCT = NA, 
    POP = 334203604, ONE_DOSE_NUM_18Plus = 206114881, ONE_DOSE_PCT_18Plus = NA, 
    FULL_DOSE_NUM_18Plus = 178587829, FULL_DOSE_PCT_18Plus = NA, 
    POP_18Plus = 258316189, ONE_DOSE_NUM_65Plus = 53312849, ONE_DOSE_PCT_65Plus = NA, 
    FULL_DOSE_NUM_65Plus = 47324330, FULL_DOSE_PCT_65Plus = NA, 
    POP_65Plus = 55128330, ONE_DOSE_NUM_12Plus = 219898471, ONE_DOSE_PCT_12Plus = NA, 
    FULL_DOSE_NUM_12Plus = 189835922, FULL_DOSE_PCT_12Plus = NA, 
    POP_12Plus = 282154872), row.names = c(NA, -1L), class = c("tbl_df", 
"tbl", "data.frame"))

Upvotes: 0

Views: 106

Answers (1)

Jon Spring
Jon Spring

Reputation: 66415

dplyr equivalent of your df[,names(select(df, contains('PCT')))] <- NA:

df %>%
  mutate(across(contains("PCT"), ~NA))

Alternatively, you could deal with this "down-stream" by summarizing only the non-PCT columns. If you bind_rows to your original data, I believe the PCT columns would NA since they wouldn't be in the result here:

df %>%  
  summarize(across(where(is.numeric) & !contains("PCT"),
                   ~sum(.x, na.rm = T)))

Upvotes: 3

Related Questions