Reputation: 5049
I have a data-frame dfu
that holds for each id
(id belongs to one team
, team
has many ids
) the percentage samples where a bunch of properties prop1
, prop2
and so on are observed based on some past studies - this is used as sort of reference table for future studies. Now there is data from new experiment which gives a new set of ids
. I need to find the percentage samples where prop1
, prop2
and so on are observed on per team
basis by using the reference data in dfu
. This could be done by counting the number of occurrences per id
in dfi
and then take a weighted average grouped by team
.- not all ids in dfu
may be present and one or more ids not present in dfu
may be present in dfi
. The ids not present in dfu
may be excluded from the weighted average as no presence per property values are available for them.
dfu <- data.frame(id=1:6, team=c('A',"B","C","A","A","C"), prop1=c(0.8,0.9,0.6,0.5,0.8,0.9), prop2=c(0.2,0.3,.3,.2,.2,.3))
> dfu
id team prop1 prop2
1 A 0.8 0.2
2 B 0.9 0.3
3 C 0.6 0.3
4 A 0.5 0.2
5 A 0.8 0.2
6 C 0.9 0.3
>
> dfi <- data.frame(id=c(2 , 3 , 2 , 1 , 4 , 3 , 7))
> dfi
id
2
3
2
1
4
3
7
The output format would be like below. For example the value for prop1
for group A
would be (0.8*1 + 0.5*1)/2 = 0.65
.
team prop1 prop2
A
B
C
prefer base R approach, other approaches welcome. The number of columns could be many.
Upvotes: 0
Views: 51
Reputation: 5049
taking cue from @DJJ's answer.
dfu <- data.frame(id=1:6, team=c('A',"B","C","A","A","C"),
prop1=c(0.8,0.9,0.6,0.5,0.8,0.9),
prop2=c(0.2,0.3,.3,.2,.2,.3))
dfi <- data.frame(id=c(2 , 3 , 2 , 1 , 4 , 3 , 7))
Merge by id
> dfx <- merge(dfi, dfu, by="id")
> dfx
id team prop1 prop2
1 1 A 0.8 0.2
2 2 B 0.9 0.3
3 2 B 0.9 0.3
4 3 C 0.6 0.3
5 3 C 0.6 0.3
6 4 A 0.5 0.2
Aggregate prop1
and prop2
by team
with mean
> aggregate(cbind(prop1, prop2) ~ team, dfx, mean)
team prop1 prop2
1 A 0.65 0.2
2 B 0.90 0.3
3 C 0.60 0.3
Upvotes: 1
Reputation: 2539
I don't know exactly how to do it with base R.
With data.table it's should be pretty easy. Let convert your data.frames into data.table.
library(data.table)
dfu <- data.frame(id=1:6, team=c('A',"B","C","A","A","C"), prop1=c(0.8,0.9,0.6,0.5,0.8,0.9), prop2=c(0.2,0.3,.3,.2,.2,.3))
dfi <- data.frame(id=c(2 , 3 , 2 , 1 , 4 , 3 , 7))
dfi <- data.table(dfi)
dfu <- data.table(dfu)
Then merge them like
dfu[dfi,on="id"]
## > dfu[dfi,on="id"]
## id team prop1 prop2
## 1: 2 B 0.9 0.3
## 2: 3 C 0.6 0.3
## 3: 2 B 0.9 0.3
## 4: 1 A 0.8 0.2
## 5: 4 A 0.5 0.2
## 6: 3 C 0.6 0.3
## 7: 7 NA NA NA
Then we just have to perform the mean by group. In fact we can to it one liner like
dfu[dfi,on="id"][,mean(prop1),team]
## > dfu[dfi,on="id"][,mean(prop1),team]
## team V1
## 1: B 0.90
## 2: C 0.60
## 3: A 0.65
## 4: NA NA
You can achieve the same thing in base R by merging the data.frame and using the function aggregate
I guess.
Upvotes: 1