Reputation: 1337
I have the following data and would like to calculate the mean values for t1-t5, grouped by iid, for every cid.
Sample data:
cid iid t1 t2 t3 t4 t5
4503 2 4 5 5 6 7
4503 2 7 5 3 2 8
841 2 9 2 1 5 7
2134 1 6 9 8 2 1
Here is my code so far. Can somebody please help me to finish it off. Thanks in advance.
library(dplyr)
test <- read.csv("~/Documents/R-SCRIPTS/DATA/test.csv", sep=";")
t <- test %>%
group_by(cid, iid) %>%
select(t1:t5) %>%
summarise(t1 = mean(t1, na.rm = TRUE),
t2 = mean(t2,na.rm = TRUE),
t3 = mean(t3,na.rm = TRUE),
t4 = mean(t4,na.rm = TRUE),
t5 = mean(t5,na.rm = TRUE)
)
Upvotes: 2
Views: 445
Reputation: 70336
If I understand correctly, you could simply use:
test %>%
group_by(cid, iid) %>%
summarise_each(funs(mean(.[.>0], na.rm = TRUE)), t1:t5)
#Source: local data frame [3 x 7]
#Groups: cid [?]
#
# cid iid t1 t2 t3 t4 t5
# (int) (int) (dbl) (dbl) (dbl) (dbl) (dbl)
#1 841 2 9.0 2 1 5 7.0
#2 2134 1 6.0 9 8 2 1.0
#3 4503 2 5.5 5 4 4 7.5
Upvotes: 1
Reputation: 585
Is this what you want? I do not use other packages, but use the colMeans()
.
Here is an example:
The data looks like (a short copy of your example)
> mydata
iid t1 t2 t3
1 2 4 5 5
2 2 7 5 3
3 2 9 2 1
4 1 6 9 8
The code:
id_list <- unique(mydata$iid) # get the id
result <- matrix(nrow=0, ncol=4) # create a matrix to store result
colnames(result) <- colnames(mydata) # name the columns of the matrix
for (i in 1:length(id_list)){
uid <- id_list[i]
# for each id, calculate the column averages
average <- unname(colMeans(mydata[mydata$iid==uid,2:4]))
# write to the result
result <- rbind(result, c(uid, average))
}
result
The results looks like:
> result
iid t1 t2 t3
[1,] 2 6.666667 4 3
[2,] 1 6.000000 9 8
For your problem, you need to change the colMeans(mydata[mydata$iid==uid,2:4])
to colMeans(mydata[mydata$iid==uid,2:201])
, which are your column index you want the mean. And change ncol
in matrix(nrow=0, ncol=4)
corresponding the result data you want.
For values < 0, you can first covert the negative values into NA, mydata[,2:4][mydata[,2:4]<0]<-NA
. Then add na.rm=TRUE
in colMeans()
.
Update the same example:
> mydata
iid t1 t2 t3
1 2 4 5 5
2 2 -2 5 3
3 2 9 2 1
4 1 6 9 -1
Code:
mydata[,2:4][mydata[,2:4]<0]<-NA
id_list <- unique(mydata$iid)
result <- matrix(nrow=0, ncol=4)
colnames(result) <- colnames(mydata)
for (i in 1:length(id_list)){
uid <- id_list[i]
average <- unname(colMeans(mydata[mydata$iid==uid,2:4], na.rm=TRUE))
result <- rbind(result, c(uid, average))
}
result
Result:
> result
iid t1 t2 t3
[1,] 2 6.5 4 3
[2,] 1 6.0 9 NaN
Upvotes: 0