deleting rows in R containing one blank column

Question

I have a data set like:

  ID               EMAIL_ID.x        ID.y     EMAIL_ID.y
 60842               k@aol.com      60842        k@aol.com
 90853                a.b.c         NA            
 90854                b.c.d         NA            
 83907               h@gwi.net      02854         
 83908             s@hotmail.com    00952         
 83909             l@rediff.com     78895

In this data, i only want to retain the rows in which both the 3rd and the 4th column are either populated or both not populated.

Hence the result should be like:

   ID               EMAIL_ID.x       ID.y     EMAIL_ID.y
 60842               k@aol.com      60842        k@aol.com
 90853                a.b.c         NA            
 90854                b.c.d         NA

In the above, the rows where column 3 had values but column 4 did not, have been deleted.

Please suggest how can this be done in R.

Thankyou!

akrun · Accepted Answer

One option is to check whether there are NA elements (is.na(...)) in both (&) columns or (|) both non-NA elements (by negating !) and use that logical index to subset the rows.

df1[is.na(df1[,3]) & is.na(df1[,4]) | !is.na(df1[,3]) & !is.na(df1[,4]),]
#    ID EMAIL_ID.x JPNUMBER.y EMAIL_ID.y
#1 60842  k@aol.com      60842  k@aol.com
#2 90853      a.b.c         NA       
#3 90854      b.c.d         NA

Or another option is to apply rowSums on the logical matrix (is.na(df1[3:4])), check if the sum of NA elements are either 0 or 2 using %in% (if it is 0 - it implies there are no NA elements and 2 means both the columns have NA) and use the logical vector to subset the rows.

df1[rowSums(is.na(df1[3:4])) %in% c(0,2),]
#    ID EMAIL_ID.x JPNUMBER.y EMAIL_ID.y
#1 60842  k@aol.com      60842  k@aol.com
#2 90853      a.b.c         NA       
#3 90854      b.c.d         NA

Or another option is Reduce with lapply (to avoid conversion to matrix - in case the dataset is really big)

df1[Reduce(`+`, lapply(df1[3:4], is.na)) != 1,]

NOTE: If the NA in the OP's dataset is not real NA, we need to convert it to real NA before doing this (assuming the 4th column is character class)

is.na(df1[,4]) <- df1[,4] == ""

data

df1 <- structure(list(ID = c(60842L, 90853L, 90854L, 83907L, 83908L, 
83909L), EMAIL_ID.x = c("k@aol.com", "a.b.c", "b.c.d", "h@gwi.net", 
"s@hotmail.com", "l@rediff.com"), JPNUMBER.y = c(60842L, NA, 
NA, 2854L, 952L, 78895L), EMAIL_ID.y = c("k@aol.com", NA, NA, 
NA, NA, NA)), .Names = c("ID", "EMAIL_ID.x", "JPNUMBER.y", "EMAIL_ID.y"
), row.names = c(NA, -6L), class = "data.frame")

deleting rows in R containing one blank column

Answers (1)

data

Related Questions