Reputation:

R Delete rows with multiple NULLs (edited)

I have a very large data frame with hundreds of variables. I want to delete rows where there is a NULL for variables that are in consecutive columns. The data frame, df, looks something like this:

ID     V1     V2    V3    V4    V5   V6    V7    V8    V9    V10    V11     V12
ABC    1      2     3     4     1    2     3     NULL  4      1     AB      BC
DEF    2      3     NULL  4     2    3     4     1     2      3     AB      BC
GHI    NULL   NULL  NULL  NULL  NULL NULL  NULL  NULL  NULL  NULL   AB      BC
JKL    3      4     1     2     3    4     1     2     3      4     AB      BC
MNO    1      2     3     4     1    NULL  2     3     4      1     AB      BC

In this data frame, I want to delete ONLY row df$ID=="GHI" for example, so I get:

ID     V1     V2    V3    V4    V5   V6    V7    V8    V9    V10    V11     V12
ABC    1      2     3     4     1    2     3     NULL  4      1     AB      BC
DEF    2      3     NULL  4     2    3     4     1     2      3     AB      BC
JKL    3      4     1     2     3    4     1     2     3      4     AB      BC
MNO    1      2     3     4     1    NULL  2     3     4      1     AB      BC

Thanks!

Upvotes: 0

Answers (4)

RHertel

Reputation: 23818

One can use rowSums to count the occurrences of "NULL" and subset the dataframe by retaining only those rows with at most one NULL:

newdf <- df1[rowSums(df1=="NULL")<2,]
#> newdf
#   ID V1 V2   V3 V4 V5   V6 V7   V8 V9 V10 V11 V12
#1 ABC  1  2    3  4  1    2  3 NULL  4   1  AB  BC
#2 DEF  2  3 NULL  4  2    3  4    1  2   3  AB  BC
#4 JKL  3  4    1  2  3    4  1    2  3   4  AB  BC
#5 MNO  1  2    3  4  1 NULL  2    3  4   1  AB  BC

data:

df1 <- read.table(text="ID     V1     V2    V3    V4    V5   V6    V7    V8    V9    V10    V11     V12
ABC    1      2     3     4     1    2     3     NULL  4      1     AB      BC
DEF    2      3     NULL  4     2    3     4     1     2      3     AB      BC
GHI    NULL   NULL  NULL  NULL  NULL NULL  NULL  NULL  NULL  NULL   AB      BC
JKL    3      4     1     2     3    4     1     2     3      4     AB      BC
MNO    1      2     3     4     1    NULL  2     3     4      1     AB      BC", 
header=TRUE)

Upvotes: 1

Rich Scriven

Reputation: 99391

Seems like a job for rle().

a <- !apply(df[paste0("V", 1:10)] == "NULL", 1, function(x) {
    with(rle(x), any(lengths[values] > 1))
})

df[a, ]
#    ID V1 V2   V3 V4 V5   V6 V7   V8 V9 V10 V11 V12
# 1 ABC  1  1    1  1  1    1  1 NULL  1   1  AB  BC
# 2 DEF  1  1 NULL  1  1    1  1    1  1   1  AB  BC
# 4 JKL  1  1    1  1  1    1  1    1  1   1  AB  BC
# 5 MNO  1  1    1  1  1 NULL  1    1  1   1  AB  BC

Data:

df <- structure(list(ID = c("ABC", "DEF", "GHI", "JKL", "MNO"), V1 = c("1", 
"1", "NULL", "1", "1"), V2 = c("1", "1", "NULL", "1", "1"), V3 = c("1", 
"NULL", "NULL", "1", "1"), V4 = c("1", "1", "NULL", "1", "1"), 
    V5 = c("1", "1", "NULL", "1", "1"), V6 = c("1", "1", "NULL", 
    "1", "NULL"), V7 = c("1", "1", "NULL", "1", "1"), V8 = c("NULL", 
    "1", "NULL", "1", "1"), V9 = c("1", "1", "NULL", "1", "1"
    ), V10 = c("1", "1", "NULL", "1", "1"), V11 = c("AB", "AB", 
    "AB", "AB", "AB"), V12 = c("BC", "BC", "BC", "BC", "BC")), .Names = c("ID", 
"V1", "V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10", 
"V11", "V12"), class = "data.frame", row.names = c(NA, -5L))

Upvotes: 3

DatamineR

Reputation: 9628

You could try:

df[!apply(df, 1, function(x) sum(sapply(x, function(x) x == "NULL"))>1),]
   ID V1 V2   V3 V4 V5   V6 V7   V8 V9 V10 V11 V12
1 ABC  1  1    1  1  1    1  1 NULL  1   1  AB  BC
2 DEF  1  1 NULL  1  1    1  1    1  1   1  AB  BC
4 JKL  1  1    1  1  1    1  1    1  1   1  AB  BC
5 MNO  1  1    1  1  1 NULL  1    1  1   1  AB  BC

Upvotes: 0

noname

Reputation: 482

If you want consecutive NA or NULL, then you can try,

df[-which(apply( df, 1, function(x) { seq<-which(is.na(x)); ifelse(any(diff(seq)==1),TRUE,FALSE) } )),]

Otherwise use the sum method.

Upvotes: 1

R Delete rows with multiple NULLs (edited)

Answers (4)

Related Questions