odenhem
odenhem

Reputation: 101

R - split data frame without removing NA values

If I have a df:

letter    body_part
    a     head
    b     head
    c     NA
    d     NA
    e     left_foot

And I want to split it into 2 dfs... One with only body_part - "head" and the other with everything else. I.e.

list <- split(df, df$body_part == 'head')

Can I do that without dropping the NA rows? (I know I can do it if I fill the NAs with a string, but is there a way that avoids that step?)

Upvotes: 2

Views: 1050

Answers (3)

moodymudskipper
moodymudskipper

Reputation: 47350

You can convert the f argument of split() to factor while not exluding the NA values.

df <- read.table(h= T, strin = F, text = "
letter    body_part
    a     head
    b     head
    c     NA
    d     NA
    e     left_foot")

split(df, factor(df$body_part,exclude = NULL))
#> $head
#>   letter body_part
#> 1      a      head
#> 2      b      head
#> 
#> $left_foot
#>   letter body_part
#> 5      e left_foot
#> 
#> $<NA>
#>   letter body_part
#> 3      c      <NA>
#> 4      d      <NA>
split(df, factor(df$body_part,exclude = NULL) == 'head')
#> $`FALSE`
#>   letter body_part
#> 3      c      <NA>
#> 4      d      <NA>
#> 5      e left_foot
#> 
#> $`TRUE`
#>   letter body_part
#> 1      a      head
#> 2      b      head

Created on 2019-10-14 by the reprex package (v0.3.0)

Upvotes: 1

thelatemail
thelatemail

Reputation: 93938

From ?`%in%`:

That ‘%in%’ never returns ‘NA’ makes it particularly useful in ‘if’ conditions.

# just to show how the `==` comparison compares  
> df$s_col <- df$body_part == 'head'

> split(df, df$body_part %in% 'head')
$`FALSE`
  letter body_part s_col
3      c      <NA>    NA
4      d      <NA>    NA
5      e left_foot FALSE

$`TRUE`
  letter body_part s_col
1      a      head  TRUE
2      b      head  TRUE

Upvotes: 5

Shahab Einabadi
Shahab Einabadi

Reputation: 342

> ind <- df$body_part == 'head'
> ind[is.na(ind)] <- FALSE
> split(df, ind)
$`FALSE`
# A tibble: 3 x 2
  letter body_part
   <chr>     <chr>
1      c      <NA>
2      d      <NA>
3      e left_foot

$`TRUE`
# A tibble: 2 x 2
  letter body_part
   <chr>     <chr>
1      a      head
2      b      head

Upvotes: 0

Related Questions