Reputation: 5300
How can I subset rows from a data frame if rows in a given column are blank or NA. For example:
x <- c(1,2,3,4,"","","")
y <- c("A","B","C","D","E","F","G")
z <- c(100,200,300,400,500,600,700)
xyz <- data.frame(x,y,z)
View(xyz)
g1 <- subset(xyz, subset=(x > 0))
Returns:
Warning message: In Ops.factor(x, 0) : > not meaningful for factors
How can I get it to return a new data frame that's a subset of the original but only containing rows where X column is greater than zero?
Upvotes: 2
Views: 11929
Reputation: 57686
When you created your data frame, you specified that x
should be a factor variable.
(Technically you specified that it should be character, but data.frame
has read your mind and converted it to factor for you. Again, technically you didn't specify that it should be character, but R has read your mind and, because you tried to combine numbers and characters in the one vector, it's coerced them all into a vector of character mode.)
Because of this, "greater than zero" doesn't make sense as a comparison operator in this context. I'm going to read your mind and conclude that you actually want x
to be numeric, but with an allowance for situations where the value is not available. In that case, you should do
xyz$x <- as.numeric(as.character(xyz$x))
subset(xyz, !is.na(x))
Upvotes: 5
Reputation: 16026
Because x
is stored as a factor, being greater than a value doesn't make any sense here. You can use indexing:
xyz[xyz$x != "",]
# x y z
# 1 1 A 100
# 2 2 B 200
# 3 3 C 300
# 4 4 D 400
NA
is different to ""
, and you can do a logical test for that using is.na()
. So if the values in this case were NA
rather than ""
, xyz[!is.na(xyz$x),]
would do the trick
Upvotes: 4