ah bon
ah bon

Reputation: 10021

Replace multiple characters from multiple columns in R

Given a dataframe as follows:

structure(list(date = structure(1:24, .Label = c("2010Y1-01m", 
"2010Y1-02m", "2010Y1-03m", "2010Y1-04m", "2010Y1-05m", "2010Y1-06m", 
"2010Y1-07m", "2010Y1-08m", "2010Y1-09m", "2010Y1-10m", "2010Y1-11m", 
"2010Y1-12m", "2011Y1-01m", "2011Y1-02m", "2011Y1-03m", "2011Y1-04m", 
"2011Y1-05m", "2011Y1-06m", "2011Y1-07m", "2011Y1-08m", "2011Y1-09m", 
"2011Y1-10m", "2011Y1-11m", "2011Y1-12m"), class = "factor"), 
    a = structure(c(1L, 18L, 19L, 20L, 22L, 23L, 2L, 4L, 5L, 
    7L, 8L, 10L, 1L, 21L, 3L, 6L, 9L, 11L, 12L, 13L, 14L, 15L, 
    16L, 17L), .Label = c("--", "10159.28", "10295.69", "10580.82", 
    "10995.65", "11245.84", "11327.23", "11621.99", "12046.63", 
    "12139.78", "12848.27", "13398.26", "13962.6", "14559.72", 
    "14982.58", "15518.64", "15949.87", "7363.45", "8237.71", 
    "8830.99", "9309.47", "9316.56", "9795.77"), class = "factor"), 
    b = structure(c(2L, 16L, 23L, 24L, 4L, 6L, 7L, 9L, 10L, 12L, 
    14L, 17L, 1L, 22L, 3L, 5L, 8L, 11L, 13L, 15L, 18L, 19L, 20L, 
    21L), .Label = c("-", "--", "1058.18", "1455.6", "1539.01", 
    "1867.07", "2036.92", "2102.23", "2372.84", "2693.96", "2769.65", 
    "2973.04", "3146.88", "3227.23", "3604.71", "365.07", "3678.01", 
    "4043.18", "4438.55", "4860.76", "5360.94", "555.51", "653.19", 
    "980.72"), class = "factor"), c = structure(c(2L, 6L, 10L, 
    11L, 13L, 15L, 16L, 18L, 20L, 22L, 24L, 7L, 1L, 9L, 12L, 
    14L, 17L, 19L, 21L, 23L, 3L, 4L, 5L, 8L), .Label = c("-", 
    "--", "1092.73", "1222.48", "1409.07", "158.18", "1748.44", 
    "2179.42", "227.68", "268.53", "331.81", "366.95", "434.19", 
    "486.41", "538.49", "606.62", "614.75", "651.46", "729.44", 
    "736.55", "836.46", "890.81", "929.72", "981.65"), class = "factor")), class = "data.frame", row.names = c(NA, 
-24L))

How could I replace -- and - in only columns a and b with NA? Thanks.

Upvotes: 0

Views: 59

Answers (2)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

I think it's better to try to avoid the data being read in like this in the first place, but if you need to correct it after, you can try using the na.strings argument in type.convert. Notice that it's na.strings with an "s" -- it's plural, so more than one value can be used to represent NA values.

df[c("a", "b")] <- lapply(df[c("a", "b")], type.convert, na.strings = c("--", "-"))
str(df)
# 'data.frame':   24 obs. of  4 variables:
#  $ date: Factor w/ 24 levels "2010Y1-01m","2010Y1-02m",..: 1 2 3 4 5 6 7 8 9 10 ...
#  $ a   : num  NA 7363 8238 8831 9317 ...
#  $ b   : num  NA 365 653 981 1456 ...
#  $ c   : Factor w/ 24 levels "-","--","1092.73",..: 2 6 10 11 13 15 16 18 20 22 ...
head(df)
#         date       a       b      c
# 1 2010Y1-01m      NA      NA     --
# 2 2010Y1-02m 7363.45  365.07 158.18
# 3 2010Y1-03m 8237.71  653.19 268.53
# 4 2010Y1-04m 8830.99  980.72 331.81
# 5 2010Y1-05m 9316.56 1455.60 434.19
# 6 2010Y1-06m 9795.77 1867.07 538.49

Note that in this particular case, you could also use the side effect of as.numeric(as.character(...)) converting anything that can't be coerced to numeric to NA, but keep in mind that you will get a warning for each column that you use this approach on.

lapply(df[c("a", "b")], function(x) as.numeric(as.character(x)))

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388982

You can use :

cols <- c('a', 'b')
df[cols][df[cols] == '--' | df[cols] == '-'] <- NA

Or using dplyr :

library(dplyr)
df %>% mutate(across(c(a, b), ~replace(., . %in% c('--', '-'), NA)))

Upvotes: 2

Related Questions