Reputation: 48402
Say I have the following data frame:
LungCap Age Height Smoke Gender Caesarean
1 6.475 6 62.1 no male no
2 10.125 18 74.7 yes female no
3 9.550 16 69.7 no female yes
4 11.125 14 71.0 no male no
5 4.800 5 56.9 no male no
6 6.225 11 58.7 no female no
Now I want to select all rows where the age is > 11 and gender is female. This gets me what I want:
y[y$Age>11&y$Gender=="female",]
LungCap Age Height Smoke Gender Caesarean
2 10.125 18 74.7 yes female no
3 9.550 16 69.7 no female yes
But this does not:
y[y$Age>11&y$Gender=="female"]
Age Height
1 6 62.1
2 18 74.7
3 16 69.7
4 14 71.0
5 5 56.9
6 11 58.7
I'm very new at R and I don't understand what this second query is doing, other than it's not giving me what I want.
Upvotes: 0
Views: 109
Reputation: 1751
When you subset the dataframe with the first syntax, the first number vector (or logic vector) in the square brackets represents the rows you want to select, while the second (after the comma) represents the columns.
If you do not explicitly insert anything after the comma, R assumes you want all the columns.
If you do not even put the comma, R assumes that the first number refers to what columns you want.
In your case y$Age>11&y$Gender=="female"
is a logic vector that refers to position 2 and 3. So if you do not use comma, R thinks you want to only select columns 2 and 3. Therefore you get Age and Height.
Upvotes: 3