Reputation: 2640
I have some values in my data frames #N/A that I want to convert to NA. I'm trying what seems like a straightforward grepl via lapply on the data frame, but its not working. Here's a simple example...
a = c("#N/A", "A", "B", "#N/A", "C")
b = c("d", "#N/A", "e", "f", "123")
df = as.data.frame(cbind(a,b))
lapply(df, function(x){x[grepl("#N/A", x)]=NA})
Which outputs:
$a
[1] NA
$b
[1] NA
Can someone point me in the right direction? I'd appreciate it.
Upvotes: 1
Views: 2743
Reputation: 92292
As per your example in the question, you don't need any types of apply
loops, just do
df[df == "#N/A"] <- NA
As per cases when you have #N/A#N/A
(although you didn't provide such data), another way to solve this would be
df[sapply(df, function(x) grepl("#N/A", x))] <- NA
In both cases the data itself will be updated, rather just printed to consule
Upvotes: 0
Reputation: 121077
If you are reading this data in from a CSV/tab delimited file, just set na.strings = "#N/A"
.
read.table("my file.csv", na.strings = "#N/A")
Update from comment: or maybe na.strings = c("#N/A", "#N/A#N/A")
.
Even if you are stuck with the case you described in your question, you still don't need grepl
.
df <- data.frame(
a = c("#N/A", "A", "B", "#N/A", "C"),
b = c("d", "#N/A", "e", "f", "123")
)
df[] <- lapply(
df,
function(x)
{
x[x == "#N/A"] <- NA
x
}
)
df
## a b
## 1 <NA> d
## 2 A <NA>
## 3 B e
## 4 <NA> f
## 5 C 123
Upvotes: 1
Reputation: 1437
You need to return x, and it's probably best to use apply
in this case. Creating a data.frame
with cbind
is best avoided as well.
a = c("#N/A", "A", "B", "#N/A", "C")
b = c("d", "#N/A", "e", "f", "123")
df = data.frame(a=a, b=b, stringsAsFactors = FALSE)
str(df)
apply(df, 2, function(x){x[grepl("#N/A", x)] <- NA; return(x)})
Upvotes: 1
Reputation: 179428
Your function needs to return x
as the return value.
Try:
lapply(df, function(x){x[grepl("#N/A", x)] <- NA; x})
$a
[1] <NA> A B <NA> C
Levels: #N/A A B C
$b
[1] d <NA> e f 123
Levels: #N/A 123 d e f
But you should really use gsub
instead of grep
:
lapply(df, function(x)gsub("#N/A", NA, x))
$a
[1] NA "A" "B" NA "C"
$b
[1] "d" NA "e" "f" "123"
A better (more flexible and possibly easier to maintain) solution might be:
replace <- function(x, ptn="#N/A") ifelse(x %in% ptn, NA, x)
lapply(df, replace)
$a
[1] NA 2 3 NA 4
$b
[1] 3 NA 4 5 2
Upvotes: 5