Reputation: 1
new here and new to R, so bear with me, please.
I have a data.frame similar to this:
time. variable TEER
1 0.07 cntrl 234.2795
2 1.07 cntrl 602.8245
3 2.07 cntrl 703.6844
4 3.07 cntrl 699.4538
...
48 0.07 cntrl 234.2795
49 1.07 cntrl 602.8245
50 2.07 cntrl 703.6844
51 3.07 cntrl 699.4538
...
471 0.07 agr1111 251.9119
472 1.07 agr1111 480.1573
473 2.07 agr1111 629.3744
474 3.07 agr1111 676.6782
...
518 0.07 agr1111 251.9119
519 1.07 agr1111 480.1573
520 2.07 agr1111 629.3744
521 3.07 agr1111 676.6782
...
753 0.07 agr2222 350.1049
754 1.07 agr2222 306.6072
755 2.07 agr2222 346.0387
756 3.07 agr2222 447.0137
757 4.07 agr2222 530.2433
...
802 2.07 agr2222 346.0387
803 3.07 agr2222 447.0137
804 4.07 agr2222 530.2433
805 5.07 agr2222 591.2122
I'm trying to apply ddply()
to this data frame to get a new data frame with means and standard error (to plot later) like so:
> ddply(data_melt, c("time.", "variable"), summarise,
mean = mean(TEER), sd = sd(TEER),
sem = sd(TEER)/sqrt(length(TEER)))
What I get as an output data frame are same values of TEER
in the mean
column as in the first rows of the original data frame and zeroes in sd
and sem
columns. Also an error:
Warning message:
In
levels<-
(*tmp*
, value = if (nl == nL) as.character(labels) else paste0(labels, : duplicated levels in factors are deprecated
It looks like the function only goes through the first part of the data frame and doesn't bother looking at the duplicates of time.
and variable
group?
I already tried looking at the solutions to similar problems here but nothing seems to work. Am I missing something or is this a legitimate problem?
Any help / tips appreciated.
P.S Let me know if I'm not explaining the problem coherently enough and I'll try to go into more detail.
Upvotes: 0
Views: 964
Reputation: 1
I think I've found a way around my problem.
Initially, when I load the data frame, each of the variables ("cntrl, "agr1111", "agr2222"), has a unique letter and number near them ("A1", "A2", "B1", "B2"), hence, looking like this: "cntrl.A1", "agr1111.B2". Instead, of substracting the letter-number from each of them using gsub
i tried using filter
with grepl
to isolate certain rows that I need and summarise
then.
Here's the code:
library(dplyr)
dt_11 <- dt %>%
group_by(time.) %>%
filter(grepl("agr1111", variable)) %>%
summarise(avg_11 = mean(teer),
sd_11 = sd(teer),
sem_11 = sd(teer)/sqrt(length(teer)))
This only gives me a data frame with one group of variables ("agr1111") and I'll have to do this two more times, for "cntrl" and "agr2222", hence resulting in 3 data frames. But I'm sure, I'll be able to either merge the data frames or plot them on the same graph separately.
Upvotes: 0
Reputation: 8413
This doesnt fit to be an answer, but too long to be a comment :
I ran your exact code and everything works fine!
> ddply(dt, c("time.", "variable"), summarise,
+ mean = mean(TEER), sd = sd(TEER),
+ sem = sd(TEER)/sqrt(length(TEER)), count = length(TEER))
#time. variable mean sd sem count
# 0.07 agr1111 251.9119 0 0 2
# 0.07 agr2222 350.1049 NA NA 1
# 0.07 cntrl 234.2795 0 0 2
# 1.07 agr1111 480.1573 0 0 2
# 1.07 agr2222 306.6072 NA NA 1
# 1.07 cntrl 602.8245 0 0 2
# 2.07 agr1111 629.3744 0 0 2
# 2.07 agr2222 346.0387 0 0 2
# 2.07 cntrl 703.6844 0 0 2
# 3.07 agr1111 676.6782 0 0 2
# 3.07 agr2222 447.0137 0 0 2
# 3.07 cntrl 699.4538 0 0 2
# 4.07 agr2222 530.2433 0 0 2
# 5.07 agr2222 591.2122 NA NA 1
> sessionInfo()
#other attached packages:
#[1] plyr_1.8.4
Could you update to latest version of packaes. I am not sure of the cause to your problem. I hope you understand how sd
actually is calculated and why `NA~ appear.(HINT : look at the count column)
Upvotes: -1