fugu
fugu

Reputation: 6568

Use ifelse and is.null to test for NULL values

I have a data frame:

df <- structure(list(gene = structure(1:6, .Label = c("128up", "14-3-3epsilon", 
"14-3-3zeta", "140up", "18SrRNA-Psi:CR41602", "18SrRNA-Psi:CR45861"
), class = "factor"), fpkm = list(NULL, 0.4, NA_real_, NULL, 
    NULL, NULL)), .Names = c("gene", "fpkm"), row.names = c(NA, 
6L), class = "data.frame")

                 gene fpkm
1               128up NULL
2       14-3-3epsilon  0.4
3          14-3-3zeta   NA
4               140up NULL
5 18SrRNA-Psi:CR41602 NULL
6 18SrRNA-Psi:CR45861 NULL

I would like to add a new column, level based on the value in fpkm. In cases where the value is NULL or NA I would like the value to be 'not_expressed, elseexpressed`.

I'm using mutate to achieve this, for NA and NULL values separately, but this does not have the desired effect on NULL values:

mutate(df, level = ifelse(is.na(fpkm), 'not_expressed' , 'expressed'))

                 gene fpkm         level
1               128up NULL     expressed
2       14-3-3epsilon  0.4     expressed
3          14-3-3zeta   NA not_expressed # Expected
4               140up NULL     expressed
5 18SrRNA-Psi:CR41602 NULL     expressed
6 18SrRNA-Psi:CR45861 NULL     expressed

  mutate(df, level = ifelse(is.null(fpkm), 'not_expressed' , 'expressed'))

                 gene fpkm     level
1               128up NULL expressed
2       14-3-3epsilon  0.4 expressed
3          14-3-3zeta   NA expressed
4               140up NULL expressed
5 18SrRNA-Psi:CR41602 NULL expressed
6 18SrRNA-Psi:CR45861 NULL expressed

I can't work out why this isn't working - is.null(unlist(test$fpkm[1])) returns TRUE

I've also tried: ifelse(is.null(df$fpkm), 'not_expressed', 'expressed')

and: ifelse(is.null(unlist(df$fpkm)), 'not_expressed', 'expressed')

...neither of which work

Upvotes: 4

Views: 10523

Answers (4)

Manohar Rana
Manohar Rana

Reputation: 129

As an alternate solution, you can try finding out the length of the column values. Instead of :

mutate(df, level = ifelse(is.na(fpkm), 'not_expressed' , 'expressed'))

Try this:

library(stringr)
mutate(df, level = if_else(str_length(fpkm)>0, 'expressed','not_expressed'))

Upvotes: 0

amrrs
amrrs

Reputation: 6325

Base R is good enough for this:

> df$level <-  ifelse( df$fpkm == 'NULL' | is.na(df$fpkm), 'not_expressed', 'expressed')
> df
                 gene fpkm         level
1               128up NULL not_expressed
2       14-3-3epsilon  0.4     expressed
3          14-3-3zeta   NA not_expressed
4               140up NULL not_expressed
5 18SrRNA-Psi:CR41602 NULL not_expressed
6 18SrRNA-Psi:CR45861 NULL not_expressed

Upvotes: 4

zx8754
zx8754

Reputation: 56149

Convert NULL to NA, then use ifelse:

# Convert NULL to NA
df$fpkm[ sapply(df$fpkm, is.null) ] <- NA

# I would also drop the list, it is up to you.
# df$fpkm <- unlist(df$fpkm)

# Then use ifelse as usual
df$level <-  ifelse(is.na(df$fpkm), "not_expressed", "expressed")

# result
df
#                  gene fpkm         level
# 1               128up   NA not_expressed
# 2       14-3-3epsilon  0.4     expressed
# 3          14-3-3zeta   NA not_expressed
# 4               140up   NA not_expressed
# 5 18SrRNA-Psi:CR41602   NA not_expressed
# 6 18SrRNA-Psi:CR45861   NA not_expressed

Upvotes: 1

CPak
CPak

Reputation: 13581

The last statement is a clue to your issue. You had to unlist to achieve the right result

unlist(test$fpkm[1])

fpkm is saved as a list in your data frame (and not a vector, which is typical)

'data.frame':   6 obs. of  2 variables:
 $ gene: Factor w/ 6 levels "128up","14-3-3epsilon",..: 1 2 3 4 5 6
 $ fpkm:List of 6
  ..$ : NULL
  ..$ : num 0.4
  ..$ : num NA
  ..$ : NULL
  ..$ : NULL
  ..$ : NULL

You can get the right result with

  mutate(df, level = ifelse(map_lgl(df$fpkm, ~is.null(.x) || is.na(.x)), 'not_expressed' , 'expressed'))

Upvotes: 2

Related Questions