Randy Minder
Randy Minder

Reputation: 48402

Subsetting a data frame - Confused about syntax

Say I have the following data frame:

 LungCap Age Height Smoke Gender Caesarean
1   6.475   6   62.1    no   male        no
2  10.125  18   74.7   yes female        no
3   9.550  16   69.7    no female       yes
4  11.125  14   71.0    no   male        no
5   4.800   5   56.9    no   male        no
6   6.225  11   58.7    no female        no

Now I want to select all rows where the age is > 11 and gender is female. This gets me what I want:

y[y$Age>11&y$Gender=="female",]

  LungCap Age Height Smoke Gender Caesarean
2  10.125  18   74.7   yes female        no
3   9.550  16   69.7    no female       yes

But this does not:

y[y$Age>11&y$Gender=="female"]

  Age Height
1   6   62.1
2  18   74.7
3  16   69.7
4  14   71.0
5   5   56.9
6  11   58.7

I'm very new at R and I don't understand what this second query is doing, other than it's not giving me what I want.

Upvotes: 0

Views: 109

Answers (1)

thepule
thepule

Reputation: 1751

When you subset the dataframe with the first syntax, the first number vector (or logic vector) in the square brackets represents the rows you want to select, while the second (after the comma) represents the columns.

If you do not explicitly insert anything after the comma, R assumes you want all the columns.

If you do not even put the comma, R assumes that the first number refers to what columns you want.

In your case y$Age>11&y$Gender=="female" is a logic vector that refers to position 2 and 3. So if you do not use comma, R thinks you want to only select columns 2 and 3. Therefore you get Age and Height.

Upvotes: 3

Related Questions