PMa
PMa

Reputation: 1771

Data frame with condition

I have the following data frame called planets.df:

     type         | planets | diameter | rotation | rings
---------------------------------------------------------
Terrestrial planet| Mercury |   0.382  |  58.64   | FALSE
Terrestrial planet|   Venus |   0.949  |-243.02   | FALSE
Terrestrial planet|   Earth |   1.000  |   1.00   | FALSE
Terrestrial planet|    Mars |   0.532  |   1.03   | FALSE
Gass giant        | Jupiter |  11.209  |   0.41   | TRUE
Gass giant        |  Saturn |   9.449  |   0.43   | TRUE
Gass giant        |  Uranus |   4.007  |  -0.72   | TRUE
Gass giant        |  Neptune|   3.883  |   0.67   | TRUE

I want to get all the plants that have a ring, i.e. rings = TRUE with the following code:

ring.vector <- planets.df$rings
planets.with.rings.df <- planets.df[rings.vector,]

Can someone tell me why this works? I didn't come up with the codes myself but want to understand why it works. The part [rings.vector,] means rings=TRUE?

Thanks!

Upvotes: 1

Views: 160

Answers (5)

datawookie
datawookie

Reputation: 6534

Another angle on this is to use subset(), which is rather intuitive: it extracts only those lines from the data frame for which the condition (second argument) is true.

planets.with.rings.df <- subset(planets.df, rings == TRUE)

or just simply

planets.with.rings.df <- subset(planets.df, rings)

The "== TRUE" in the first solution is redundant since you are comparing a Boolean vector already!

Upvotes: 0

TheComeOnMan
TheComeOnMan

Reputation: 12875

It works because in a df[<condition,] type of statement, the condition part is basically a vector of T/F. The row numbers corresponding to TRUE are kept and the ones corresponding to FALSE are omitted.

rings.vector is already a vector of T/F. You could instead use a rings.vector == TRUE condition which would give the same condition.

And in your case, it probably doesn't matter, but be careful if you have NAs in your condition vector or the column you are filtering on.

Upvotes: 2

marbel
marbel

Reputation: 7714

Here is a small reproducible example. I've added some examples using data.table. Please, correct the code if it's not right.

data <- data.frame(id = 1:100, x = rnorm(100, 100, 50))
data$flag <- ifelse(data$x > 100, TRUE, FALSE)
head(data)

# FALSE can be subseted using 0 
data[data == FALSE]
data[data == 0]
str(data$flag)

# As it's of class:
class(data$flag)

# Using Data Table
library("data.table")
DT <- data.table(data)

setkey(DT, flag)
DT[J(FALSE)]
DT[J(TRUE)]

# Aggregate (Group by)
DT[, quantile(x), by = flag]

DT[, list(mean(x), 
          sum = sum(x),
          meadian = median(x))
   , by = flag]

Upvotes: 0

Max Candocia
Max Candocia

Reputation: 4385

When you have a data frame, you can reference specific rows and columns 2 different ways.

  1. You can call the numbers of the columns and rows explicitly by using df[row_numbers,column_numbers], or
  2. You can use boolean variables (TRUE/FALSE) to indicate which rows/columns you want. With the rings.vector, it will look for the row numbers that match the indices of all the TRUE values in rings.vector and pull out the corresponding rows when you use df[rings.vector,].

In the above example, nothing is being checked for in the columns, but you need the comma in the brackets to indicate that the object before the comma refers to rows. Most of the time you'll only use the boolean values for rows and specific numbers for columns out of simplicity.

Upvotes: 0

alittleboy
alittleboy

Reputation: 10956

rings.vector is a vector that contains indicators of TRUE or FALSE, which correspond to the column of rings. If you want to subset those rings with TRUE value, then using [rings.vector, ] will select those rows that rings==TRUE and all columns.

Upvotes: 3

Related Questions