jonas
jonas

Reputation: 13969

Use dplyr to filter out columns containing characters

I have a large dataframe that I would like to use the excellent package dplyr (Wickham) which I just recently discovered. I would like to filter out columns that contain characters. Is this possible?

For example, in the flights datasets within the nycflights13 package, how could I filter out the columns that have class character?

library(nycflights13)
data(flights)
str(flights)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   336776 obs. of  16 variables:
 $ year     : int  2013 2013 2013 2013 2013 2013 2013 2013 2013 2013 ...
 $ month    : int  1 1 1 1 1 1 1 1 1 1 ...
 $ day      : int  1 1 1 1 1 1 1 1 1 1 ...
 $ dep_time : int  517 533 542 544 554 554 555 557 557 558 ...
 $ dep_delay: num  2 4 2 -1 -6 -4 -5 -3 -3 -2 ...
 $ arr_time : int  830 850 923 1004 812 740 913 709 838 753 ...
 $ arr_delay: num  11 20 33 -18 -25 12 19 -14 -8 8 ...
 $ carrier  : chr  "UA" "UA" "AA" "B6" ...
 $ tailnum  : chr  "N14228" "N24211" "N619AA" "N804JB" ...
 $ flight   : int  1545 1714 1141 725 461 1696 507 5708 79 301 ...
 $ origin   : chr  "EWR" "LGA" "JFK" "JFK" ...
 $ dest     : chr  "IAH" "IAH" "MIA" "BQN" ...
 $ air_time : num  227 227 160 183 116 150 158 53 140 138 ...
 $ distance : num  1400 1416 1089 1576 762 ...
 $ hour     : num  5 5 5 5 5 5 5 5 5 5 ...
 $ minute   : num  17 33 42 44 54 54 55 57 57 58 ...

Any ideas?

Upvotes: 10

Views: 3175

Answers (5)

sbha
sbha

Reputation: 10432

Here is a dplyr/tidyverse option using select_if() (using the dplyr starwars sample data):

starwars %>% 
  select_if(~!is.character(.)) %>% 
  head(2)

# A tibble: 2 x 6
    height  mass birth_year films     vehicles  starships
     <int> <dbl>      <dbl> <list>    <list>    <list>   
  1    172    77         19 <chr [5]> <chr [2]> <chr [2]>
  2    167    75        112 <chr [6]> <chr [0]> <chr [0]>

Upvotes: 1

Rich Scriven
Rich Scriven

Reputation: 99371

I don't have the flights data, but this method also works on some other data I experimented on

do(flights, Filter(Negate(is.character), .))

Of course, there's always base R. For this task it seems a bit easier

Filter(Negate(is.character), flights)

Upvotes: 5

Tim
Tim

Reputation: 7474

You don't need dplyr for that, you can use base R:

flights[, !sapply(flights, is.character)]

Upvotes: 6

akrun
akrun

Reputation: 887891

You could try summarise_each from dplyr

library(dplyr) 
indx <- which(unlist(summarise_each(flights, funs(class))!='character'))
flights %>% 
       select(indx)

Upvotes: 8

jbaums
jbaums

Reputation: 27408

I don't think there's a dplyr shortcut for this, but you can get what you're after with:

flights %>% select(which(sapply(flights, class) != 'character'))

# Source: local data frame [336,776 x 12]
# 
#    year month day dep_time dep_delay arr_time arr_delay flight air_time distance hour minute
# 1  2013     1   1      517         2      830        11   1545      227     1400    5     17
# 2  2013     1   1      533         4      850        20   1714      227     1416    5     33
# 3  2013     1   1      542         2      923        33   1141      160     1089    5     42
# 4  2013     1   1      544        -1     1004       -18    725      183     1576    5     44
# 5  2013     1   1      554        -6      812       -25    461      116      762    5     54
# 6  2013     1   1      554        -4      740        12   1696      150      719    5     54
# 7  2013     1   1      555        -5      913        19    507      158     1065    5     55
# 8  2013     1   1      557        -3      709       -14   5708       53      229    5     57
# 9  2013     1   1      557        -3      838        -8     79      140      944    5     57
# 10 2013     1   1      558        -2      753         8    301      138      733    5     58
# ..  ...   ... ...      ...       ...      ...       ...    ...      ...      ...  ...    ...

Upvotes: 6

Related Questions