Reputation: 197
df:
y x
F T
F F
T T
T F
df1:
y z probs.x x probs.y new
F F 0.08 T 0.4 0.032
F F 0.24 F 0.4 0.096
F T 0.12 T 0.6 0.072
F T 0.36 F 0.6 0.216
T F 0.40 T 0.5 0.200
T F 0.20 F 0.5 0.100
T T 0.40 T 0.5 0.200
T T 0.20 F 0.5 0.100
df and df1 are the two data frames. And for each row of df, I want to select the matching rows in df1, add the values in column “new”, and store output in a new data frame like this.
df_res:
y x new
F T .104
F F .312
T T .4
T F .2
Kindly help me out! I have been toiling over this for a long time now. The table headers will change according to the variables, so please do do not hard code the table headers.
Thanks.
Upvotes: 1
Views: 625
Reputation: 12468
Here's an answer using merge and plyr.
Read in your example data.frame:
df1 <- read.table(text="y z probs.x x probs.y new
F F 0.08 T 0.4 0.032
F F 0.24 F 0.4 0.096
F T 0.12 T 0.6 0.072
F T 0.36 F 0.6 0.216
T F 0.40 T 0.5 0.200
T F 0.20 F 0.5 0.100
T T 0.40 T 0.5 0.200
T T 0.20 F 0.5 0.100", header=T, sep="")
If I understand, there are 2 steps to what your asking. First is to select rows in df1 that match patterns in df. That can be done with merge. The df you gave has all combinations of True and False for x and y. Let's leave one out so we can see the effect:
df <- read.table(text="y x
F T
T T
T F",header=T,sep="")
df_merged <- merge(df, df1, all.y=F)
The results are a new data.frame the omits the rows where both x and y are F. This is equivalent to a left join in a SQL database.
y x z probs.x probs.y new
1 FALSE TRUE FALSE 0.08 0.4 0.032
2 FALSE TRUE TRUE 0.12 0.6 0.072
3 TRUE FALSE FALSE 0.20 0.5 0.100
4 TRUE FALSE TRUE 0.20 0.5 0.100
5 TRUE TRUE FALSE 0.40 0.5 0.200
6 TRUE TRUE TRUE 0.40 0.5 0.200
The second part of the question is to group the data and apply a sum to the groups. Plyr is a great tool for this kind of data manipulation:
library(plyr)
ddply(df_merged, .(y,x), function(df) c(new=sum(df$new)))
The dd means we are giving a data.frame and want a data.frame as a result. The next argument .(y,x) is a quoted expression and names the variables we're grouping by. The result is this:
y x new
1 FALSE TRUE 0.104
2 TRUE FALSE 0.200
3 TRUE TRUE 0.400
Upvotes: 1
Reputation: 3711
it seems like if all you want is F,T combination. this works. otherwise you have to write more clearly.
text=" y z probs.x x probs.y new
F F 0.08 T 0.4 0.032
F F 0.24 F 0.4 0.096
F T 0.12 T 0.6 0.072
F T 0.36 F 0.6 0.216
T F 0.40 T 0.5 0.200
T F 0.20 F 0.5 0.100
T T 0.40 T 0.5 0.200
T T 0.20 F 0.5 0.100"
df<-read.table(text=text, header=T)
df_res<-aggregate(data=df, new~interaction(y,x),sum)
interaction(y, x) new
1 FALSE.FALSE 0.312
2 TRUE.FALSE 0.200
3 FALSE.TRUE 0.104
4 TRUE.TRUE 0.400
Upvotes: 1
Reputation: 6118
I don't know how long is your data but this can be one approach.
df<- read.table(text="y x
F T
F F
T T
T F",header=T,sep="")
df1 <- read.table(text="y z probs.x x probs.y new
F F 0.08 T 0.4 0.032
F F 0.24 F 0.4 0.096
F T 0.12 T 0.6 0.072
F T 0.36 F 0.6 0.216
T F 0.40 T 0.5 0.200
T F 0.20 F 0.5 0.100
T T 0.40 T 0.5 0.200
T T 0.20 F 0.5 0.100", header=T, sep="")
df$yx <- paste0(df$y,df$x)
df1$yx <- paste0(df1$y, df1$x)
# Update automatically using the for loop
for (i in 1:4){
new[i] <- sum(df1[which(df1[,7]==df[i,3]),6])
}
df$new <- new
df
y x yx new
1 FALSE TRUE FALSETRUE 0.104
2 FALSE FALSE FALSEFALSE 0.312
3 TRUE TRUE TRUETRUE 0.400
4 TRUE FALSE TRUEFALSE 0.200
Using sapply
new <- sapply(1:4, function(x) sum(df1[which(df1[,7]==df[x,3]),6]))
Upvotes: 2