user113156
user113156

Reputation: 7107

grepl across multiple columns in R

I have the following data which has n.a. values (which R does not recognise)

I am trying to remove these values using grepl

x <- x[!grepl("n.a.", x$Fixed.assets.EUR.Last.avail..yr),]

but I am trying to apply it across all columns instead of specifying each column name and having many lines of text.

What I currently have is

x <- sapply(x[, c(1:4)], !grepl("n.a."))

which produces errors and does not work.

Error in match.fun(FUN) : 
  '!grepl("n.a.", x[, 1:4])' is not a function, character or symbol

Data

dput(x)[1:6, ]
  Fixed.assets.EUR.Last.avail..yr Fixed.assets.EUR.Year...1 Fixed.assets.EUR.Year...2
1                      34,827,809                38,549,311                29,035,369
2                         755,256                   658,200                   573,888
3                       2,639,824                 2,739,205                 3,230,890
4                       2,543,367                 2,317,132                 2,994,769
5                       1,608,004                 1,702,838                 1,763,244
6                         661,875                   661,082                   584,166
  Fixed.assets.EUR.Year...3
1                30,416,099
2                      n.a.
3                 2,841,046
4                   693,370
5                 2,024,666
6                   565,007

Upvotes: 1

Views: 1746

Answers (3)

CPak
CPak

Reputation: 13581

Here are 2 alternative options

Example Data

set.seed(1)
df <- as.data.frame(matrix(sample(c("n.a.", "good"), 20, replace=TRUE), ncol=2, byrow=TRUE))
head(df)

    # V1   V2
# 1 n.a. n.a.
# 2 good good
# 3 n.a. good
# 4 good good
# 5 good n.a.
# 6 n.a. n.a.

Convert n.a. to NA, then use complete.cases

data <- replace(df, df == "n.a.", NA)
data[complete.cases(data),]

    # V1   V2
# 2 good good
# 4 good good
# 9 good good

Use rowSums

df[rowSums(df == "n.a.") == 0,]

    # V1   V2
# 2 good good
# 4 good good
# 9 good good

Upvotes: 0

AMS Nomad
AMS Nomad

Reputation: 73

If you want R to recognize "n.a." as NA values without removing the entire row (and hence losing real values across a row with an n.a. value in only one column), you can use this:

df[df=="n.a."] <- NA

Otherwise, you are better off using @Mako212's solution.

Upvotes: 0

Mako212
Mako212

Reputation: 7312

Let me start by saying that the best practice here would be to specify a na.strings = c("n.a.") argument when you read in your data. That said, this is a way to use grepl() to remove any row where you have n.a. as a string.

x[-which(apply(x[,1:4],1,function(y) any(grepl("n.a.",y, fixed=TRUE)))),]

Upvotes: 4

Related Questions