Alex Petralia
Alex Petralia

Reputation: 1770

na.strings applied to a dataframe

I currently have a dataframe in which there are several rows I would like converted to "NA". When I first imported this dataframe from a .csv, I could use na.strings=c("A", "B", "C) and so on to remove the values I didn't want.

I want to do the same thing again, but this time using a dataframe already, not importing another .csv

To import the data, I used:

data<-read.csv("code.csv", header=T, strip.white=TRUE, stringsAsFactors=FALSE, na.strings=c("", "A", "B", "C"))

Now, with "data", I would like to subset it while removing even more specific values in the rows.. I tried someting like:

data2<-data.frame(data, na.strings=c("D", "E", "F"))

Of course this doesn't work because I think na.strings only works with the "read" package.. not other functions. Is there any equivalent to simply convert certain values into NA so I can na.omit(data2) fairly easily?

Thanks for your help.

Upvotes: 6

Views: 8970

Answers (4)

Vincent
Vincent

Reputation: 5249

Since we don't have your data I will use mtcars. Suppose we want to set values anywhere in mtcars that are equal to 4 or 19.2 to NA

ind <- which(mtcars == 4, arr.ind = TRUE)
mtcars[ind] <- NA

In your setting you would replace this number by "D" or "E"

Upvotes: 1

Sven Hohenstein
Sven Hohenstein

Reputation: 81693

Here's a way to replace values in multiple columns:

# an example data frame
dat <- data.frame(x = c("D", "E", "F", "G"), 
                  y = c("A", "B", "C", "D"), 
                  z = c("X", "Y", "Z", "A"))
#   x y z
# 1 D A X
# 2 E B Y
# 3 F C Z
# 4 G D A

# values to replace
na.strings <- c("D", "E", "F")

# index matrix 
idx <- Reduce("|", lapply(na.strings, "==", dat))

# replace values with NA
is.na(dat) <- idx

dat
#     x    y z
# 1 <NA>    A X
# 2 <NA>    B Y
# 3 <NA>    C Z
# 4    G <NA> A

Upvotes: 3

Prasanna Nandakumar
Prasanna Nandakumar

Reputation: 4335

data[ data == "D" ] = NA

Note that if you were trying to replace NA with "D", the reverse (df[ df == NA ] = "D") will not work; you would need to use df[is.na(df)] <- "D"

Upvotes: 1

mathematical.coffee
mathematical.coffee

Reputation: 56915

Just assign the NA values directly.

e.g.:

x <- data.frame(a=1:5, b=letters[1:5])
# > x
#   a b
# 1 1 a
# 2 2 b
# 3 3 c
# 4 4 d
# 5 5 e

# convert the 'b' and 'd' in columb b to NA
x$b[x$b %in% c('b', 'd')] <- NA
# > x
#  a     b
# 1 1    a
# 2 2 <NA>
# 3 3    c
# 4 4 <NA>
# 5 5    e

Upvotes: 2

Related Questions