Reputation: 13969
I have a large dataframe that I would like to use the excellent package dplyr
(Wickham) which I just recently discovered. I would like to filter out columns that contain characters. Is this possible?
For example, in the flights
datasets within the nycflights13
package, how could I filter out the columns that have class character
?
library(nycflights13)
data(flights)
str(flights)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 336776 obs. of 16 variables:
$ year : int 2013 2013 2013 2013 2013 2013 2013 2013 2013 2013 ...
$ month : int 1 1 1 1 1 1 1 1 1 1 ...
$ day : int 1 1 1 1 1 1 1 1 1 1 ...
$ dep_time : int 517 533 542 544 554 554 555 557 557 558 ...
$ dep_delay: num 2 4 2 -1 -6 -4 -5 -3 -3 -2 ...
$ arr_time : int 830 850 923 1004 812 740 913 709 838 753 ...
$ arr_delay: num 11 20 33 -18 -25 12 19 -14 -8 8 ...
$ carrier : chr "UA" "UA" "AA" "B6" ...
$ tailnum : chr "N14228" "N24211" "N619AA" "N804JB" ...
$ flight : int 1545 1714 1141 725 461 1696 507 5708 79 301 ...
$ origin : chr "EWR" "LGA" "JFK" "JFK" ...
$ dest : chr "IAH" "IAH" "MIA" "BQN" ...
$ air_time : num 227 227 160 183 116 150 158 53 140 138 ...
$ distance : num 1400 1416 1089 1576 762 ...
$ hour : num 5 5 5 5 5 5 5 5 5 5 ...
$ minute : num 17 33 42 44 54 54 55 57 57 58 ...
Any ideas?
Upvotes: 10
Views: 3175
Reputation: 10432
Here is a dplyr
/tidyverse
option using select_if()
(using the dplyr starwars sample data):
starwars %>%
select_if(~!is.character(.)) %>%
head(2)
# A tibble: 2 x 6
height mass birth_year films vehicles starships
<int> <dbl> <dbl> <list> <list> <list>
1 172 77 19 <chr [5]> <chr [2]> <chr [2]>
2 167 75 112 <chr [6]> <chr [0]> <chr [0]>
Upvotes: 1
Reputation: 99371
I don't have the flights data, but this method also works on some other data I experimented on
do(flights, Filter(Negate(is.character), .))
Of course, there's always base R. For this task it seems a bit easier
Filter(Negate(is.character), flights)
Upvotes: 5
Reputation: 7474
You don't need dplyr
for that, you can use base R:
flights[, !sapply(flights, is.character)]
Upvotes: 6
Reputation: 887891
You could try summarise_each
from dplyr
library(dplyr)
indx <- which(unlist(summarise_each(flights, funs(class))!='character'))
flights %>%
select(indx)
Upvotes: 8
Reputation: 27408
I don't think there's a dplyr
shortcut for this, but you can get what you're after with:
flights %>% select(which(sapply(flights, class) != 'character'))
# Source: local data frame [336,776 x 12]
#
# year month day dep_time dep_delay arr_time arr_delay flight air_time distance hour minute
# 1 2013 1 1 517 2 830 11 1545 227 1400 5 17
# 2 2013 1 1 533 4 850 20 1714 227 1416 5 33
# 3 2013 1 1 542 2 923 33 1141 160 1089 5 42
# 4 2013 1 1 544 -1 1004 -18 725 183 1576 5 44
# 5 2013 1 1 554 -6 812 -25 461 116 762 5 54
# 6 2013 1 1 554 -4 740 12 1696 150 719 5 54
# 7 2013 1 1 555 -5 913 19 507 158 1065 5 55
# 8 2013 1 1 557 -3 709 -14 5708 53 229 5 57
# 9 2013 1 1 557 -3 838 -8 79 140 944 5 57
# 10 2013 1 1 558 -2 753 8 301 138 733 5 58
# .. ... ... ... ... ... ... ... ... ... ... ... ...
Upvotes: 6