AME
AME

Reputation: 5300

Subsetting blank rows from data frame in R

How can I subset rows from a data frame if rows in a given column are blank or NA. For example:

    x <- c(1,2,3,4,"","","")
    y <- c("A","B","C","D","E","F","G")
    z <- c(100,200,300,400,500,600,700)
    xyz <- data.frame(x,y,z)
    View(xyz)

enter image description here

g1 <- subset(xyz, subset=(x > 0))

Returns:

Warning message: In Ops.factor(x, 0) : > not meaningful for factors

How can I get it to return a new data frame that's a subset of the original but only containing rows where X column is greater than zero?

Upvotes: 2

Views: 11929

Answers (2)

Hong Ooi
Hong Ooi

Reputation: 57686

When you created your data frame, you specified that x should be a factor variable.

(Technically you specified that it should be character, but data.frame has read your mind and converted it to factor for you. Again, technically you didn't specify that it should be character, but R has read your mind and, because you tried to combine numbers and characters in the one vector, it's coerced them all into a vector of character mode.)

Because of this, "greater than zero" doesn't make sense as a comparison operator in this context. I'm going to read your mind and conclude that you actually want x to be numeric, but with an allowance for situations where the value is not available. In that case, you should do

xyz$x <- as.numeric(as.character(xyz$x))
subset(xyz, !is.na(x))

Upvotes: 5

alexwhan
alexwhan

Reputation: 16026

Because x is stored as a factor, being greater than a value doesn't make any sense here. You can use indexing:

xyz[xyz$x != "",]
#   x y   z
# 1 1 A 100
# 2 2 B 200
# 3 3 C 300
# 4 4 D 400

NA is different to "", and you can do a logical test for that using is.na(). So if the values in this case were NA rather than "", xyz[!is.na(xyz$x),] would do the trick

Upvotes: 4

Related Questions