Reputation: 1771
I have the following data frame called planets.df
:
type | planets | diameter | rotation | rings
---------------------------------------------------------
Terrestrial planet| Mercury | 0.382 | 58.64 | FALSE
Terrestrial planet| Venus | 0.949 |-243.02 | FALSE
Terrestrial planet| Earth | 1.000 | 1.00 | FALSE
Terrestrial planet| Mars | 0.532 | 1.03 | FALSE
Gass giant | Jupiter | 11.209 | 0.41 | TRUE
Gass giant | Saturn | 9.449 | 0.43 | TRUE
Gass giant | Uranus | 4.007 | -0.72 | TRUE
Gass giant | Neptune| 3.883 | 0.67 | TRUE
I want to get all the plants that have a ring, i.e. rings = TRUE
with the following code:
ring.vector <- planets.df$rings
planets.with.rings.df <- planets.df[rings.vector,]
Can someone tell me why this works? I didn't come up with the codes myself but want to understand why it works. The part [rings.vector,]
means rings=TRUE
?
Thanks!
Upvotes: 1
Views: 160
Reputation: 6534
Another angle on this is to use subset(), which is rather intuitive: it extracts only those lines from the data frame for which the condition (second argument) is true.
planets.with.rings.df <- subset(planets.df, rings == TRUE)
or just simply
planets.with.rings.df <- subset(planets.df, rings)
The "== TRUE" in the first solution is redundant since you are comparing a Boolean vector already!
Upvotes: 0
Reputation: 12875
It works because in a df[<condition,]
type of statement, the condition
part is basically a vector of T/F. The row numbers corresponding to TRUE are kept and the ones corresponding to FALSE are omitted.
rings.vector
is already a vector of T/F. You could instead use a rings.vector == TRUE
condition which would give the same condition.
And in your case, it probably doesn't matter, but be careful if you have NA
s in your condition
vector or the column you are filtering on.
Upvotes: 2
Reputation: 7714
Here is a small reproducible example. I've added some examples using data.table
.
Please, correct the code if it's not right.
data <- data.frame(id = 1:100, x = rnorm(100, 100, 50))
data$flag <- ifelse(data$x > 100, TRUE, FALSE)
head(data)
# FALSE can be subseted using 0
data[data == FALSE]
data[data == 0]
str(data$flag)
# As it's of class:
class(data$flag)
# Using Data Table
library("data.table")
DT <- data.table(data)
setkey(DT, flag)
DT[J(FALSE)]
DT[J(TRUE)]
# Aggregate (Group by)
DT[, quantile(x), by = flag]
DT[, list(mean(x),
sum = sum(x),
meadian = median(x))
, by = flag]
Upvotes: 0
Reputation: 4385
When you have a data frame, you can reference specific rows and columns 2 different ways.
df[row_numbers,column_numbers]
, or rings.vector
, it will look for the row numbers that match the indices of all the TRUE values in rings.vector and pull out the corresponding rows when you use df[rings.vector,]
. In the above example, nothing is being checked for in the columns, but you need the comma in the brackets to indicate that the object before the comma refers to rows. Most of the time you'll only use the boolean values for rows and specific numbers for columns out of simplicity.
Upvotes: 0
Reputation: 10956
rings.vector
is a vector that contains indicators of TRUE
or FALSE
, which correspond to the column of rings
. If you want to subset those rings with TRUE
value, then using [rings.vector, ]
will select those rows that rings==TRUE
and all columns.
Upvotes: 3