ForEverNewbie
ForEverNewbie

Reputation: 512

Attempting to subset dataframe

Using the mtcars dataframe:

head(mtcars)
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

and while learning subsetting (now I know how to do it by using indexing correctly or subset), I was experimenting and used the code

head(mtcars[,-mtcars$drat])
                   mpg drat    wt  qsec vs am gear carb
Mazda RX4         21.0 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7 3.15 3.440 17.02  0  0    3    2
Valiant           18.1 2.76 3.460 20.22  1  0    3    1

What is the logic behind this output?

Upvotes: 2

Views: 45

Answers (1)

akrun
akrun

Reputation: 887118

It converts the numeric to integer and use that as index to remove the columns

as.integer(mtcars$drat)
#[1] 3 3 3 3 3 2 3 3 3 3 3 3 3 3 2 3 3 4 4 4 3 2 3 3 3 4 4 3 4 3 3 4

The coersion to integer is mentioned in ?Extract

i, j - indices specifying elements to extract or replace. Indices are numeric or character vectors or empty (missing) or NULL. Numeric values are coerced to integer as by as.integer (and hence truncated towards zero).

i.e. it would give the same output by removing the unique column index got from the 'drat' column

setdiff(seq_along(mtcars), as.integer(mtcars$drat))
#[1]  1  5  6  7  8  9 10 11
head(mtcars[setdiff(seq_along(mtcars), as.integer(mtcars$drat))])
#                   mpg drat    wt  qsec vs am gear carb
#Mazda RX4         21.0 3.90 2.620 16.46  0  1    4    4
#Mazda RX4 Wag     21.0 3.90 2.875 17.02  0  1    4    4
#Datsun 710        22.8 3.85 2.320 18.61  1  1    4    1
#Hornet 4 Drive    21.4 3.08 3.215 19.44  1  0    3    1
#Hornet Sportabout 18.7 3.15 3.440 17.02  0  0    3    2
#Valiant           18.1 2.76 3.460 20.22  1  0    3    1

However, if we remove the - i.e. select columns based on the index, then it would duplicate the columns because some the index (integer converted are duplicated)

head(mtcars[as.integer(mtcars$drat)]) 

Upvotes: 2

Related Questions