Reputation: 6874
I have the following data:
structure(list(chr = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1), leftPos = c(720916L, 736092L, 818159L,
4105086L, 4140849L, 4164911L, 4464314L, 4764317L, 4906564L, 5010398L,
5690705L, 5775286L, 5867036L, 7230474L, 9719183L, 9723741L, 10142816L,
12733035L, 12742872L, 13996088L), Means1 = c(-4.50166373984263,
22.7414854787421, 76.5674149543741, 49.6843654244713, -6.76800756009397,
-5.48616887633702, 171.974765532105, -2.8004419462491, 251.238878920906,
24.3172106993831, 92.07778585458, -9.2775123433234, 110.959507183586,
-2.57278963951353, 30.5966726830686, -6.67824575085661, 13.9003103772433,
99.9034503108899, 11.7786055209302, 57.4114695945089), Means2 = c(-4.50166373984263,
39.4695853682122, 128.134808692772, 140.343227472869, 12.2782402432039,
4.51728744523118, 103.786232453211, 7.2030143753191, 77.4307425703948,
17.496276591372, 46.8586391096806, -9.2775123433234, 90.4074077805074,
18.1392689981969, 32.2700556572673, -6.67824575085661, 17.7360779927443,
72.2607261926, -5.09768550127924, 55.027412413907), Means3 = c(-4.50166373984263,
46.7166260656872, 102.330672294296, 91.1594353147049, -6.76800756009397,
7.86681297424078, 134.793892503456, -2.8004419462491, 103.286968070986,
15.8436793365245, 59.5942459167218, -9.2775123433234, 114.600293200803,
21.6610650365489, 19.763233485681, -6.67824575085661, 21.3471182295293,
101.934037199673, -5.09768550127924, 72.3408810065695), Means4 = c(0,
49.6962372223769, 132.868329968312, 151.140132674376, 19.0462478032979,
10.0034563215682, 106.841760397079, 10.0034563215682, 88.6977192399028,
22.252438424421, 49.7002255511708, 0, 106.856532283882, 20.7120586377105,
34.6415203071056, 0, 20.7120586377105, 75.2943986695856, 0, 58.1026432784387
)), .Names = c("chr", "leftPos", "Means1", "Means2", "Means3",
"Means4"), row.names = c(NA, 20L), class = "data.frame")
I would like to group by chr and leftPos, and get a mean of each of the other columns in that group.
I tried the following to do this:
CLL <- function (col) {
col <- as.data.frame(RawZoutliers %>%
group_by(chr, binnum = (leftPos) %/% 500000) %>%
summarise(Means = mean(Means)) %>%
mutate(leftPos = (binnum+1) * 120000) %>%
select(leftPos, Means))
}
RawZoutliersBin<-lapply(RawZoutliers, CLL)
but it was a failure. The error says:
Error during wrapup: no applicable method for 'group_by_' applied to an object of class "c('double', 'numeric')"
Error during wrapup: target context is not on the stack
Error during wrapup: error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': target context is not on the stack)
I'm not sure how to do this
Upvotes: 1
Views: 929
Reputation: 24074
You can get the mean of every column by chr
and leftPos
with:
aggregate(. ~ chr + leftPos, data=mydata, FUN=mean)
In your example data, as your leftPos
are all distinct, you will get your original data.frame as result.
For example, to get the mean by chr
of every column, apart from chr
(including leftPos
):
aggregate(. ~ chr, data=mydata, FUN=mean)
# chr leftPos Means1 Means2 Means3 Means4
#1 1 6372642 48.75335 41.89019 43.90577 47.82846
EDIT
If you want to do the means by "bins" of leftPos
, here is a way with cut
mydata$bin <- cut(mydata$leftPos, breaks=c(seq(min(mydata$leftPos), max(mydata$leftPos), by=500000), max(mydata$leftPos)), include.lowest=T, labels=FALSE)
(you don't have to do a call to cut
by chromosome as you are further subtyping your data also by chromosome).
Changing the last 4 chr
to 2, you'll get:
aggregate(. ~ chr + bin, data=mydata, FUN=mean)
# chr bin leftPos Means1 Means2 Means3 Means4
#1 1 1 758389 31.602412 54.367577 48.181878 60.85486
#2 1 7 4136949 12.476730 52.379585 30.752747 60.06328
#3 1 8 4464314 171.974766 103.786232 134.793893 106.84176
#4 1 9 4893760 90.918549 34.043345 38.776735 40.31787
#5 1 10 5690705 92.077786 46.858639 59.594246 49.70023
#6 1 11 5821161 50.840997 40.564948 52.661390 53.42827
#7 1 14 7230474 -2.572790 18.139269 21.661065 20.71206
#8 1 18 9719183 30.596673 32.270056 19.763233 34.64152
#9 1 19 9723741 -6.678246 -6.678246 -6.678246 0.00000
#10 2 19 10142816 13.900310 17.736078 21.347118 20.71206
#11 2 25 12737954 55.841028 33.581520 48.418176 37.64720
#12 2 27 13996088 57.411470 55.027412 72.340881 58.10264
Upvotes: 3