vanish007
vanish007

Reputation: 323

Aggregating via dplyr - mutating a single column from factor to numeric

Hi and thank you for reading.

I've been trying to aggregate some data and HAVE successfully been able to do it via the aggregate function, but I also wanted to try and do the same thing by running a pipeline with dplyr - however I keep receiving the error:

Error in mutate_impl(.data, dots) : Evaluation error: could not find function "15.2".

I currently have this data set p:

    sample    gene           ct
1    s001     gapdh         15.2
2    s001     gapdh           16
3    s001     gapdh         14.8
4    s002     gapdh         16.2
5    s002     gapdh           17
6    s002     gapdh         16.7
7    s003     gapdh Undetermined
8    s003     gapdh         14.6
9    s003     gapdh           15
10   s001      actb         24.5
11   s001      actb         24.2 
12   s001      actb         24.7
13   s002      actb           25
14   s002      actb         25.7
15   s002      actb         25.5
16   s003      actb         27.3
17   s003      actb         27.4
18   s003      actb Undetermined

and want it to get it to:

  p2$sample p2$gene  p2$ct.mean    p2$ct.sd
1      s001    actb 24.46666667  0.25166115
2      s002    actb 25.40000000  0.36055513
3      s003    actb 27.35000000  0.07071068
4      s001   gapdh 15.33333333  0.61101009
5      s002   gapdh 16.63333333  0.40414519
6      s003   gapdh 14.80000000  0.28284271

The code I'm currently using that results in the above error:

library(dplyr)

p_ave_sd <- p %>% 
  filter(p$ct != "Undetermined") %>%
  mutate_at(as.character(p$ct), as.numeric, rm.na = TRUE) %>%
  group_by(p$gene) %>% 
  summarise(mean=mean(p$ct), sd=sd(p$ct))

It's definitely the "mutate" step that's tripping me up and I've tried mutate_all(), mutate_if(is.factor, is.numeric) and such, but each has its own error.

Thanks for the help!

Upvotes: 0

Views: 158

Answers (2)

hamagust
hamagust

Reputation: 854

I am not sure if I understood your question, but a possibility is:

p_ave_sd <- p %>% 
   filter(ct != "undetermined") %>%
   mutate(ct=as.numeric(ct)) %>%
   group_by(gene,sample) %>% 
   summarise(mean=mean(ct), sd=sd(ct))

Upvotes: 0

www
www

Reputation: 39174

Here is the way to use mutate_at. If you only have one column to convert, mutate will also work and more straightforward.

library(dplyr)

dat2 <- dat %>%
  filter(!ct %in% "Undetermined") %>%
  # mutate(ct = as.numeric(ct)) %>% <<< This will also work
  mutate_at(vars(ct), funs(as.numeric(.))) %>%
  group_by(sample, gene) %>% 
  summarise(mean = mean(ct), sd = sd(ct)) %>%
  ungroup()

dat2
# # A tibble: 6 x 4
#   sample gene   mean     sd
#   <chr>  <chr> <dbl>  <dbl>
# 1 s001   actb   24.5 0.252 
# 2 s001   gapdh  15.3 0.611 
# 3 s002   actb   25.4 0.361 
# 4 s002   gapdh  16.6 0.404 
# 5 s003   actb   27.4 0.0707
# 6 s003   gapdh  14.8 0.283 

DATA

dat <- read.table(text = "    sample    gene           ct
1    s001     gapdh         15.2
                  2    s001     gapdh           16
                  3    s001     gapdh         14.8
                  4    s002     gapdh         16.2
                  5    s002     gapdh           17
                  6    s002     gapdh         16.7
                  7    s003     gapdh Undetermined
                  8    s003     gapdh         14.6
                  9    s003     gapdh           15
                  10   s001      actb         24.5
                  11   s001      actb         24.2 
                  12   s001      actb         24.7
                  13   s002      actb           25
                  14   s002      actb         25.7
                  15   s002      actb         25.5
                  16   s003      actb         27.3
                  17   s003      actb         27.4
                  18   s003      actb Undetermined",
                  header = TRUE, stringsAsFactors = FALSE)

Upvotes: 1

Related Questions