Reputation: 362
Given a simple data frame:
df <-
structure(
list(
lowercase = c("j", "t", "u"),
uppercase = c("J", "T", "U")
),
row.names = c("10", "20", "21"),
class = "data.frame"
)
> df
lowercase uppercase
10 j J
20 t T
21 u U
Selecting using row names that do not exist usually returns a data frame of NA
s:
> df["2",]
lowercase uppercase
NA <NA> <NA>
... but not always:
> df["1",]
lowercase uppercase
10 j J
Why does subsetting a data frame sometimes return rows for which there is no (exact) matching row.name?
I've tried this on both linux (CentOS) and MacOS, using R versions 3.1.2
, 3.2.3
, 3.6.0
, and 4.0.3
with the same results.
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS: ~/tools/lib64/R/lib/libRblas.so
LAPACK: ~/tools/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8
[5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8
[7] LC_PAPER=en_CA.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.0.2
Upvotes: 1
Views: 48
Reputation: 362
After a much closer reading of the Extract.data.frame
manual page, I see that the square bracket notation will partially match row names.
The example given in the manual is:
> sw <- swiss[1:5, 1:4]
> sw
Fertility Agriculture Examination Education
Courtelary 80.2 17.0 15 12
Delemont 83.1 45.1 6 9
Franches-Mnt 92.5 39.7 5 5
Moutier 85.8 36.5 12 7
Neuveville 76.9 43.5 17 15
> sw["C",]
Fertility Agriculture Examination Education
Courtelary 80.2 17 15 12
The recommended solution is to use match
:
> sw[match("C", row.names(sw)), ]
Fertility Agriculture Examination Education
NA NA NA NA NA
Bringing this answer back to question posed above, the correct approach would be:
df[match("1", row.names(df)), ]
Upvotes: 1