Reputation: 4040
Given:
df <- structure(list(word = c("aaliyahmaxwell", "abasc", "abbslovesfed",
"abbycastro", "abc", "abccarpet", "abdul", "ability", "abnormile",
"abraham"), chardonnay = c(4, 0, 0, 0, 0, 0, 0, 0, 0, 0), coffee = c(0,
1, 1, 1, 1, 1, 1, 1, 1, 1)), .Names = c("word", "chardonnay",
"coffee"), row.names = c(NA, -10L), class = c("tbl_df", "tbl",
"data.frame"))
Why does df %>% filter_all(all_vars(. > 0))
work?
I mean that my first column is of type character and can't be > 0. I can understand why it works on the other two columns but need an explanation on why it works when I have a mixture of character and double type columns.
Please advise.
Upvotes: 1
Views: 654
Reputation: 2867
Even though there is already a good answer, I think this can be made clearer with an example:
> c("a", 0)
[1] "a" "0"
Here you can see what happens, the number gets coerced to a character.
Characters get compared lexically. Example:
> "b" > "a"
[1] TRUE
> "a" > "5"
[1] TRUE
> charvector <- sample(c(seq(1,9), LETTERS))
> charvector
[1] "6" "D" "T" "U" "I" "R" "F" "S" "J" "W" "B" "A" "8" "E" "2" "7" "O" "Z" "V" "G" "9" "4" "H" "C" "Y" "1" "X" "5" "M" "K" "Q" "L" "N" "3" "P"
The order becomes also clear when you sort that vector:
> sort(charvector)
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Y" "Z"
Upvotes: 2
Reputation: 887501
It is due to type change. Here, 0
a numeric entry gets type converted to a character one. According to `?Comparison
If the two arguments are atomic vectors of different types, one is coerced to the type of the other, the (decreasing) order of precedence being character, complex, numeric, integer, logical and raw.
df %>%
filter(word > 0)
giving all the rows of the original data because
letters > 0
#[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#[26] TRUE
In the 'word' column, it is all characters which would any way be greater than "0" due to type conversion, leaving only the all_vars
to essentially check whether the other numeric columns are greater than 0
In the OP's dataset example, none of the rows match the criteria because one of the numeric columns is always less than or equal to 0 in each of the rows. If we change the first row of 'coffee' to 2 or 1, that row would be picked up because the 'chardonnay' is greater than 0, the first column 'word' is always greater
df$coffee[1] <- 2
df %>%
filter_all(all_vars(. > 0))
# A tibble: 1 x 3
# word chardonnay coffee
# <chr> <dbl> <dbl>
#1 aaliyahmaxwell 4 2
To select only numeric columns, use filter_if
(as in the comments)
df %>%
filter_if(is.numeric, all_vars(. > 0))
Upvotes: 2