NewUsr_stat
NewUsr_stat

Reputation: 2571

Subset character columns from a data frame of characters and numbers

I have a data frame composed of numeric and non-numeric columns.

I would like to extract (subset) only the non-numeric columns, so the character ones. While I was able to subset the numeric columns using the string: sub_num = x[sapply(x, is.numeric)], I'm not able to do the opposite using the is.character form. Can anyone help me?

Upvotes: 14

Views: 42116

Answers (6)

AlexB
AlexB

Reputation: 3269

As per most recent dplyr updates:

starwars %>% 
  select(where(is.character))

You can switch is.character to is.numeric/ is.factor and so on.

Another way would be to use keep or discard functions from purrr package:

starwars %>% 
  purrr::keep(~is.character(.)) 

starwars %>% 
  purrr::discard(~!is.character(.))

Upvotes: 3

sbha
sbha

Reputation: 10422

If you are trying to select only character columns, this can be done with dplyr::select_if() and is.character(). Using the dplyr::starwars sample data as an example:

library(dplyr)
starwars %>% 
  select_if(is.character) %>% 
  head(2)
# A tibble: 2 x 7
  name           hair_color skin_color eye_color gender homeworld species
  <chr>          <chr>      <chr>      <chr>     <chr>  <chr>     <chr>  
1 Luke Skywalker blond      fair       blue      male   Tatooine  Human  
2 C-3PO          NA         gold       yellow    NA     Tatooine  Droid 

Or if you are trying to negate a certain column type, note that the syntax is slightly different:

starwars %>%  
  select_if(~!is.numeric(.)) %>% 
  head(2)

# A tibble: 2 x 10
    name           hair_color skin_color eye_color gender homeworld species films     vehicles  starships
    <chr>          <chr>      <chr>      <chr>     <chr>  <chr>     <chr>   <list>    <list>    <list>   
  1 Luke Skywalker blond      fair       blue      male   Tatooine  Human   <chr [5]> <chr [2]> <chr [2]>
  2 C-3PO          NA         gold       yellow    NA     Tatooine  Droid   <chr [6]> <chr [0]> <chr [0]>

Upvotes: 11

dondapati
dondapati

Reputation: 849

Using the @ Tyler Example

x <- data.frame(a=runif(10), b=1:10, c=letters[1:10], 
    d=as.factor(rep(c("A", "B"), each=5)), 
    e=as.Date(seq(as.Date("2000/1/1"), by="month", length.out=10)),
    stringsAsFactors = FALSE)

In Base R

base::Filter(Negate(is.numeric),x)



   c d          e
1  a A 2000-01-01
2  b A 2000-02-01
3  c A 2000-03-01
4  d A 2000-04-01
5  e A 2000-05-01
6  f B 2000-06-01
7  g B 2000-07-01
8  h B 2000-08-01
9  i B 2000-09-01
10 j B 2000-10-01

Upvotes: 0

Hill Nguyen
Hill Nguyen

Reputation: 21

The other previous answers are not that clear. So I post this approach. To get the names of the character columns, you can do the following thing:

chrs <- sapply(df_data, is.character)
chrCols <- names(df_data[, chrs])

Upvotes: 1

Tyler Rinker
Tyler Rinker

Reputation: 109844

Try:

x[sapply(x, function(x) !is.numeric(x))]

As it will pull anything not numeric so factors and character.

EDIT:

x <- data.frame(a=runif(10), b=1:10, c=letters[1:10], 
    d=as.factor(rep(c("A", "B"), each=5)), 
    e=as.Date(seq(as.Date("2000/1/1"), by="month", length.out=10)),
    stringsAsFactors = FALSE)

# > str(x)
# 'data.frame':   10 obs. of  5 variables:
#  $ a: num  0.814 0.372 0.732 0.522 0.626 ...
#  $ b: int  1 2 3 4 5 6 7 8 9 10
#  $ c: chr  "a" "b" "c" "d" ...
#  $ d: Factor w/ 2 levels "A","B": 1 1 1 1 1 2 2 2 2 2
#  $ e: Date, format: "2000-01-01" "2000-02-01" ...

x[sapply(x, function(x) !is.numeric(x))]

Upvotes: 3

Thilo
Thilo

Reputation: 9157

Ok, I did a short try about my idea.

I could confirm that the following code snippet is working:

str(d)
 'data.frame':  5 obs. of  3 variables:
  $ a: int  1 2 3 4 5
  $ b: chr  "a" "a" "a" "a" ...
  $ c: Factor w/ 1 level "b": 1 1 1 1 1


# Get all character columns
d[, sapply(d, class) == 'character']

# Or, for factors, which might be likely:
d[, sapply(d, class) == 'factor']

# If you want to get both factors and characters use
d[, sapply(d, class) %in% c('character', 'factor')]

Using the correct class, your sapply-approach should work as well, at least as long as you insert the missing , before the sapply function.

The approach using !is.numeric does not scale very well if you have classes that do not belong in the group numeric, factor, character (one I use very often is POSIXct, for example)

Upvotes: 12

Related Questions