k.rudin
k.rudin

Reputation: 3

Changing column names then converting all negative values to NA

Hi I am trying to change all my column names to different names and then convert all my column vectors which hold negative values to NA. I got the second part right but for some reason I am unable to properly change the column names to different names. This is my code; note that mscr is the csv with the column names I wish to change; I just rename it to df2. Thank you for your time and help.

df2 <- mscr %>%
  rename(
    caseid = R0000100,
    children2000 = R6389600
    )

df2 <- mscr
df2[df2 < 0] <- NA

Upvotes: 0

Views: 177

Answers (1)

r2evans
r2evans

Reputation: 160447

I might be misunderstanding, but I think what you're doing is renaming the columns (successfully), and then over-writing the newly-renamed data with the original. That is,

df2 <- mscr %>% rename(...)

is correct, and the names should then be changed. The moment you then do

df2 <- msvr

before you then replace non-positive values, you revert any changes you made.

rename (and just about every "verb" function in dplyr and many in R) operates solely in a functional manner, which means the input data is completely unchanged. If it were changed in-place, this would be "side effect", and antithetic to the "normal/idiomatic way" to do things in R.

Try this:

library(dplyr)
df2 <- mscr %>%
  rename(
    caseid = R0000100,
    children2000 = R6389600
  ) %>% 
  mutate(across(everything(), ~ if_else(. < 0, .[NA], .)))

One would normally want to use just NA, but since NA is technically a logical class, and I'm inferring that your data is numeric or integer, we need to get the right class. One option is to do this step individually for numeric and then integer columns, for which we would use NA_real_ and NA_integer_, respectively. However, .[NA] in this case will give the NA classed the same as the original column data.

Upvotes: 1

Related Questions