Reputation: 197

Subsetting a dataframe based on another dataframe in R

df:

  y x
  F T
  F F
  T T
  T F

df1:

 y z probs.x x probs.y   new
 F F    0.08 T     0.4 0.032
 F F    0.24 F     0.4 0.096
 F T    0.12 T     0.6 0.072
 F T    0.36 F     0.6 0.216
 T F    0.40 T     0.5 0.200
 T F    0.20 F     0.5 0.100
 T T    0.40 T     0.5 0.200
 T T    0.20 F     0.5 0.100

df and df1 are the two data frames. And for each row of df, I want to select the matching rows in df1, add the values in column “new”, and store output in a new data frame like this.

df_res:

  y x new
  F T .104
  F F .312
  T T .4
  T F .2

Kindly help me out! I have been toiling over this for a long time now. The table headers will change according to the variables, so please do do not hard code the table headers.

Thanks.

Upvotes: 1

Answers (3)

cbare

Reputation: 12468

Here's an answer using merge and plyr.

Read in your example data.frame:

df1 <- read.table(text="y z probs.x x probs.y   new
 F F    0.08 T     0.4 0.032
 F F    0.24 F     0.4 0.096
 F T    0.12 T     0.6 0.072
 F T    0.36 F     0.6 0.216
 T F    0.40 T     0.5 0.200
 T F    0.20 F     0.5 0.100
 T T    0.40 T     0.5 0.200
 T T    0.20 F     0.5 0.100", header=T, sep="")

If I understand, there are 2 steps to what your asking. First is to select rows in df1 that match patterns in df. That can be done with merge. The df you gave has all combinations of True and False for x and y. Let's leave one out so we can see the effect:

df <- read.table(text="y x
F T
T T
T F",header=T,sep="")

df_merged <- merge(df, df1, all.y=F)

The results are a new data.frame the omits the rows where both x and y are F. This is equivalent to a left join in a SQL database.

      y     x     z probs.x probs.y   new
1 FALSE  TRUE FALSE    0.08     0.4 0.032
2 FALSE  TRUE  TRUE    0.12     0.6 0.072
3  TRUE FALSE FALSE    0.20     0.5 0.100
4  TRUE FALSE  TRUE    0.20     0.5 0.100
5  TRUE  TRUE FALSE    0.40     0.5 0.200
6  TRUE  TRUE  TRUE    0.40     0.5 0.200

The second part of the question is to group the data and apply a sum to the groups. Plyr is a great tool for this kind of data manipulation:

library(plyr)
ddply(df_merged, .(y,x), function(df) c(new=sum(df$new)))

The dd means we are giving a data.frame and want a data.frame as a result. The next argument .(y,x) is a quoted expression and names the variables we're grouping by. The result is this:

      y     x   new
1 FALSE  TRUE 0.104
2  TRUE FALSE 0.200
3  TRUE  TRUE 0.400

Upvotes: 1

Ananta

Reputation: 3711

it seems like if all you want is F,T combination. this works. otherwise you have to write more clearly.

text=" y z probs.x x probs.y   new
 F F    0.08 T     0.4 0.032
 F F    0.24 F     0.4 0.096
 F T    0.12 T     0.6 0.072
 F T    0.36 F     0.6 0.216
 T F    0.40 T     0.5 0.200
 T F    0.20 F     0.5 0.100
 T T    0.40 T     0.5 0.200
 T T    0.20 F     0.5 0.100"

df<-read.table(text=text, header=T)
df_res<-aggregate(data=df, new~interaction(y,x),sum)

  interaction(y, x)   new
1       FALSE.FALSE 0.312
2        TRUE.FALSE 0.200
3        FALSE.TRUE 0.104
4         TRUE.TRUE 0.400

Upvotes: 1

Jd Baba

Reputation: 6118

I don't know how long is your data but this can be one approach.

df<- read.table(text="y x
F T
F F
T T
T F",header=T,sep="")

df1 <- read.table(text="y z probs.x x probs.y   new
 F F    0.08 T     0.4 0.032
 F F    0.24 F     0.4 0.096
 F T    0.12 T     0.6 0.072
 F T    0.36 F     0.6 0.216
 T F    0.40 T     0.5 0.200
 T F    0.20 F     0.5 0.100
 T T    0.40 T     0.5 0.200
 T T    0.20 F     0.5 0.100", header=T, sep="")

df$yx <- paste0(df$y,df$x)
df1$yx <- paste0(df1$y, df1$x)

# Update automatically using the for loop

for (i in 1:4){
  new[i] <- sum(df1[which(df1[,7]==df[i,3]),6])
}

df$new <- new
df
      y     x         yx   new
1 FALSE  TRUE  FALSETRUE 0.104
2 FALSE FALSE FALSEFALSE 0.312
3  TRUE  TRUE   TRUETRUE 0.400
4  TRUE FALSE  TRUEFALSE 0.200

Using sapply

new <- sapply(1:4, function(x) sum(df1[which(df1[,7]==df[x,3]),6]))

Upvotes: 2

Subsetting a dataframe based on another dataframe in R

Answers (3)

Related Questions