Reputation: 7190
Maybe it is a silly question but playing with subsetting I faced this thing and I can't understand why it happens.
For example let's consider a string, say "a"
, and an integer, say 3
, why this expression returns TRUE
?
"a" >= 3
[1] TRUE
Upvotes: 3
Views: 199
Reputation: 6931
When you try to compare a string to an integer, R will coerce the number into a string, so 3
becomes "3"
.
Using logical operators on strings will check if the condition is true or false given their alphabetical order. For example:
> "a" < "b"
[1] TRUE
> "b" > "c"
[1] FALSE
This results happen because for R, the ascending order is a, b, c
. Numbers usually come before letters in alphabetical orders (just check files ordered by name which start with a number). This is why you get
"a" >= 3
[1] TRUE
Finally, note that your result can vary depending on your locale and how the alphabetical order is defined on it. The manual says:
Comparison of strings in character vectors is lexicographic within the strings using the collating sequence of the locale in use: see locales. The collating sequence of locales such as en_US is normally different from C (which should use ASCII) and can be surprising. Beware of making any assumptions about the collation order: e.g. in Estonian Z comes between S and T, and collation is not necessarily character-by-character – in Danish aa sorts as a single letter, after z. In Welsh ng may or may not be a single sorting unit: if it is it follows g. Some platforms may not respect the locale and always sort in numerical order of the bytes in an 8-bit locale, or in Unicode code-point order for a UTF-8 locale (and may not sort in the same order for the same language in different character sets). Collation of non-letters (spaces, punctuation signs, hyphens, fractions and so on) is even more problematic.
This is important and should be considered if the logical operators are used to compare strings (regardless of comparing them to numbers or not).
Upvotes: 8