Tony
Tony

Reputation: 542

Indexing redundantly named vector in R

In R, when having a redundantly named vector, why is it not possible to retrieve all elements in the named vector with the selection operator?

v <- c(1,2,3,4,5)
names(v) <- c("a","b","c","c","a")
v["c"] ## Returns only 3, not c(3,4)

It looks like R assumes that vector names are unique and only retrieves the first element in a vector whose name match the argument in the selection operator.

Is this some kind of optimization? Would it not be beneficial if we were able to select multiple elements in a vector with the same name attribute? Is the point to guarantee that the number of elements returned when using the indexing operator is the same as the number of elements in the indexing vector?

Upvotes: 3

Views: 332

Answers (2)

BenBarnes
BenBarnes

Reputation: 19454

This is an educated guess, so confirmation or disproval is welcome. (confirmation below)

From ?"[": "Character vectors will be matched to the names of the object".

> match("c",names(v))
[1] 3

You could get around this by using:

> v[names(v)%in%"c"]
c c 
3 4 

EDIT: [ is a Primitive function, so it isn't actually using match. The source code holds the answer, but I haven't found it yet.

EDIT2:

The answer from the source code: The R function [ calls the C function do_subset, which can be found in the source file ./src/main/subset.c. In the example you gave, the C function stringSubscript eventually gets called, and this iterates over each name of the vector being subset (v in this case) until it finds a match. At that point, the iteration is stopped and the corresponding index and name are returned.

Therefore, only the value of your vector corresponding to the first matching name is returned when you subset using v["a"]. And it is therefore recommended to follow the suggestions in the other answer and the comments to use unique names :)

Upvotes: 3

John
John

Reputation: 23758

You don't want to use names for what you're trying to do. You're making a categorical variable, not naming each item uniquely. This is an important semantic distinction.

v <- c(1,2,3,4,5)
cat_v <- c("a","b","c","c","a")
v[cat_v == 'c'] ## Returns c(3,4)

Upvotes: 2

Related Questions