Reputation: 6568
I have a data frame:
df <- structure(list(gene = structure(1:6, .Label = c("128up", "14-3-3epsilon",
"14-3-3zeta", "140up", "18SrRNA-Psi:CR41602", "18SrRNA-Psi:CR45861"
), class = "factor"), fpkm = list(NULL, 0.4, NA_real_, NULL,
NULL, NULL)), .Names = c("gene", "fpkm"), row.names = c(NA,
6L), class = "data.frame")
gene fpkm
1 128up NULL
2 14-3-3epsilon 0.4
3 14-3-3zeta NA
4 140up NULL
5 18SrRNA-Psi:CR41602 NULL
6 18SrRNA-Psi:CR45861 NULL
I would like to add a new column, level
based on the value in fpkm
. In cases where the value is NULL
or NA
I would like the value to be 'not_expressed, else
expressed`.
I'm using mutate
to achieve this, for NA
and NULL
values separately, but this does not have the desired effect on NULL
values:
mutate(df, level = ifelse(is.na(fpkm), 'not_expressed' , 'expressed'))
gene fpkm level
1 128up NULL expressed
2 14-3-3epsilon 0.4 expressed
3 14-3-3zeta NA not_expressed # Expected
4 140up NULL expressed
5 18SrRNA-Psi:CR41602 NULL expressed
6 18SrRNA-Psi:CR45861 NULL expressed
mutate(df, level = ifelse(is.null(fpkm), 'not_expressed' , 'expressed'))
gene fpkm level
1 128up NULL expressed
2 14-3-3epsilon 0.4 expressed
3 14-3-3zeta NA expressed
4 140up NULL expressed
5 18SrRNA-Psi:CR41602 NULL expressed
6 18SrRNA-Psi:CR45861 NULL expressed
I can't work out why this isn't working - is.null(unlist(test$fpkm[1]))
returns TRUE
I've also tried:
ifelse(is.null(df$fpkm), 'not_expressed', 'expressed')
and:
ifelse(is.null(unlist(df$fpkm)), 'not_expressed', 'expressed')
...neither of which work
Upvotes: 4
Views: 10523
Reputation: 129
As an alternate solution, you can try finding out the length of the column values. Instead of :
mutate(df, level = ifelse(is.na(fpkm), 'not_expressed' , 'expressed'))
Try this:
library(stringr)
mutate(df, level = if_else(str_length(fpkm)>0, 'expressed','not_expressed'))
Upvotes: 0
Reputation: 6325
Base R is good enough for this:
> df$level <- ifelse( df$fpkm == 'NULL' | is.na(df$fpkm), 'not_expressed', 'expressed')
> df
gene fpkm level
1 128up NULL not_expressed
2 14-3-3epsilon 0.4 expressed
3 14-3-3zeta NA not_expressed
4 140up NULL not_expressed
5 18SrRNA-Psi:CR41602 NULL not_expressed
6 18SrRNA-Psi:CR45861 NULL not_expressed
Upvotes: 4
Reputation: 56149
Convert NULL to NA, then use ifelse
:
# Convert NULL to NA
df$fpkm[ sapply(df$fpkm, is.null) ] <- NA
# I would also drop the list, it is up to you.
# df$fpkm <- unlist(df$fpkm)
# Then use ifelse as usual
df$level <- ifelse(is.na(df$fpkm), "not_expressed", "expressed")
# result
df
# gene fpkm level
# 1 128up NA not_expressed
# 2 14-3-3epsilon 0.4 expressed
# 3 14-3-3zeta NA not_expressed
# 4 140up NA not_expressed
# 5 18SrRNA-Psi:CR41602 NA not_expressed
# 6 18SrRNA-Psi:CR45861 NA not_expressed
Upvotes: 1
Reputation: 13581
The last statement is a clue to your issue. You had to unlist
to achieve the right result
unlist(test$fpkm[1])
fpkm
is saved as a list in your data frame (and not a vector, which is typical)
'data.frame': 6 obs. of 2 variables:
$ gene: Factor w/ 6 levels "128up","14-3-3epsilon",..: 1 2 3 4 5 6
$ fpkm:List of 6
..$ : NULL
..$ : num 0.4
..$ : num NA
..$ : NULL
..$ : NULL
..$ : NULL
You can get the right result with
mutate(df, level = ifelse(map_lgl(df$fpkm, ~is.null(.x) || is.na(.x)), 'not_expressed' , 'expressed'))
Upvotes: 2