Ioannis
Ioannis

Reputation: 43

How to remove rows based on the column values

I have a large data.frame, example:

> m <- matrix(c(3,6,2,5,3,3,2,5,4,3,5,3,6,3,6,7,5,8,2,5,5,4,9,2,2), nrow=5, ncol=5)
> colnames(m) <- c("A", "B", "C", "D", "E")
> rownames(m) <- c("a", "b", "c", "d", "e")
> m
  A B C D E
a 3 3 5 7 5
b 6 2 3 5 4
c 2 5 6 8 9
d 5 4 3 2 2
e 3 3 6 5 2

I would like to remove all rows, where A and/or B columns have greater value than C D and E columns. So in this case rows b, d, e should be removed and I should get this:

  A B C D E
a 3 3 5 7 5
c 2 5 6 8 9

Can not remove them one by one because the data.frame has more than a million rows. Thanks

Upvotes: 3

Views: 149

Answers (3)

acylam
acylam

Reputation: 18661

Here's another solution with apply:

m[apply(m, 1, function(x) max(x[1], x[2]) < min(x[3], x[4], x[5])),]

Result:

  A B C D E
a 3 3 5 7 5
c 2 5 6 8 9

I think what you actually meant is to remove rows where max(A, B) > min(C, D, E), which translates to keep rows where all values of A and B are smaller than all values of C, D, and E.

Upvotes: 0

ngamita
ngamita

Reputation: 329

# creating the df
m <- matrix(c(3,6,2,5,3,3,2,5,4,3,5,3,6,3,6,7,5,8,2,5,5,4,9,2,2), nrow=5, ncol=5)
colnames(m) <- c("A", "B", "C", "D", "E")
rownames(m) <- c("a", "b", "c", "d", "e")

# initialize as data frame. 
m <- as.data.frame(m)
df_n <- m
for(i in 1:nrow(m)){
  #print(i)
  #print(paste(max(m[,1:2][i,]), max(m[,3:5][i,])))
  if(max(m[,1:2][i,]) > (max(m[,3:4][i,])) || max(m[,1:2][i,]) > ((m[,5])[i])){
    #df_n <- m[-i,]
    df_n[i,] <- NA
  }
}
#df_n
df_n <- df_n[complete.cases(df_n), ]
print(df_n)

Results
> print(df_n)
  A B C D E
a 3 3 5 7 5
c 2 5 6 8 9

Upvotes: 0

John Coleman
John Coleman

Reputation: 51998

Use subsetting, together with pmin() and pmax() to retain the values that you want. I'm not sure that I fully understand your criteria (you said "C D and E" but since you want to throw away row e, I think that you meant C, D or E ), but the following seems to do what you want:

> m[pmax(m[,"A"],m[,"B"])<=pmin(m[,"C"],m[,"D"],m[,"E"]),]
  A B C D E
a 3 3 5 7 5
c 2 5 6 8 9

Upvotes: 2

Related Questions