StatsViaCsh
StatsViaCsh

Reputation: 2640

How do i use grepl on each column in a data frame?

I have some values in my data frames #N/A that I want to convert to NA. I'm trying what seems like a straightforward grepl via lapply on the data frame, but its not working. Here's a simple example...

a = c("#N/A", "A", "B", "#N/A", "C")
b = c("d", "#N/A", "e", "f", "123")
df = as.data.frame(cbind(a,b))
lapply(df, function(x){x[grepl("#N/A", x)]=NA})

Which outputs:

$a
[1] NA

$b
[1] NA

Can someone point me in the right direction? I'd appreciate it.

Upvotes: 1

Views: 2743

Answers (4)

David Arenburg
David Arenburg

Reputation: 92292

As per your example in the question, you don't need any types of apply loops, just do

df[df == "#N/A"] <- NA

As per cases when you have #N/A#N/A (although you didn't provide such data), another way to solve this would be

df[sapply(df, function(x) grepl("#N/A", x))] <- NA

In both cases the data itself will be updated, rather just printed to consule

Upvotes: 0

Richie Cotton
Richie Cotton

Reputation: 121077

If you are reading this data in from a CSV/tab delimited file, just set na.strings = "#N/A".

read.table("my file.csv", na.strings = "#N/A")

Update from comment: or maybe na.strings = c("#N/A", "#N/A#N/A").


Even if you are stuck with the case you described in your question, you still don't need grepl.

df <- data.frame(
  a = c("#N/A", "A", "B", "#N/A", "C"),
  b = c("d", "#N/A", "e", "f", "123")
)
df[] <- lapply(
  df, 
  function(x)
  {
    x[x == "#N/A"] <- NA
    x
  }
)
df
##      a    b
## 1 <NA>    d
## 2    A <NA>
## 3    B    e
## 4 <NA>    f
## 5    C  123

Upvotes: 1

ndr
ndr

Reputation: 1437

You need to return x, and it's probably best to use apply in this case. Creating a data.frame with cbind is best avoided as well.

a = c("#N/A", "A", "B", "#N/A", "C")
b = c("d", "#N/A", "e", "f", "123")
df = data.frame(a=a, b=b, stringsAsFactors = FALSE)
str(df)
apply(df, 2, function(x){x[grepl("#N/A", x)] <- NA; return(x)})

Upvotes: 1

Andrie
Andrie

Reputation: 179428

Your function needs to return x as the return value.

Try:

lapply(df, function(x){x[grepl("#N/A", x)] <- NA; x})

$a
[1] <NA> A    B    <NA> C   
Levels: #N/A A B C

$b
[1] d    <NA> e    f    123 
Levels: #N/A 123 d e f

But you should really use gsub instead of grep:

lapply(df, function(x)gsub("#N/A", NA, x))
$a
[1] NA  "A" "B" NA  "C"

$b
[1] "d"   NA    "e"   "f"   "123"

A better (more flexible and possibly easier to maintain) solution might be:

replace <- function(x, ptn="#N/A") ifelse(x %in% ptn, NA, x)

lapply(df, replace)

$a
[1] NA  2  3 NA  4

$b
[1]  3 NA  4  5  2

Upvotes: 5

Related Questions