Reputation: 13
I'm stumped. With two different datasets, square brackets are returning different types of results and causing problems. In one case, the output is a list, and in the other, it's a set of values (is there another term?).
ds1 = dataset1; ds2 = dataset 2
typeof(ds1$col)
[1] "integer"
mean(ds1$col)
[1] 0.51
mean(ds1[ds1$col2==1,'col'])
[1] 0.52
typeof(ds2$col)
[1] "double"
mean(ds2$col)
[1] 0.53
mean(ds2[ds2$col2==1,'col'])
[1] NA
Warning message:
In mean.default(cmv2[cmv2$post_num == 1, "accurate"]) :
argument is not numeric or logical: returning NA
This was how I discovered the problem. To see what was up with the output, I did the following:
> ds1[ds1$col2==1,'col']
[1] 1 1 0 0 0 0
> ds2[ds2$col2==1,'col']
# A tibble: 19 × 1
accurate
<dbl>
1 1
2 0
3 0
I don't want the output to be a list. What do I do? Thanks!
Upvotes: 1
Views: 535
Reputation: 44838
In answer to your question in the title: yes, selecting subsets with square brackets is inconsistent in R.
The reason for this is that objects can have different "classes", and code can be defined for what square brackets do depending on the class. Run
class(ds1)
class(ds2)
and I'm sure you'll see different output.
I'd guess from what you showed us that class(ds1)
will give "data.frame"
, while class(ds2)
will give something like c("tbl_df", "tbl","data.frame")
, because it is a "tibble". This three-element result indicates that R will look for square bracket methods in the classes in order from left to right, and it will find one for "tbl_df"
, so it won't use the same method as the "data.frame"
class uses.
You can make results consistent by converting ds2
to a dataframe using ds2 <- as.data.frame(ds2)
. You can also tell the tibble method to act like the dataframe method by using ds2[ds2$col2 == 1, 'col', drop = FALSE]
(as suggested by @Maël, though you should always use FALSE
, not F
).
Upvotes: 4