user697473
user697473

Reputation: 2293

create data frame of interactions between two factors

I have two 2x2 data frames. Each column in each data frame is a factor.

I want to create a 2x8 data frame that contains each factor and the interactions between factors.

Here is an example:

df1 <- data.frame(V1 = factor(c('a', 'b')), V2 = factor(c('c', 'd')))
df2 <- data.frame(V3 = factor(c('e', 'f')), V4 = factor(c('g', 'h')))
df.combined <- combine(df1, df2)

Where df.combined would be

V1 V2 V3 V4 V1:V3 V1:V4 V2:V3 V2:V4
 a  c  e  g   a:e   a:g   c:e   c:g
 b  c  f  h   b:f   b:h   d:f   d:h

(I don't want the V1:V2 or V3:V4 interactions. Not needing those interactions is just in the nature of the problem that I face.)

Is there a succinct way to get df.combined in R?

Upvotes: 2

Views: 2278

Answers (3)

Joshua
Joshua

Reputation: 688

If the colons in the name are not required, it is just one line of code that takes care of both column binding the two data frames and creating the interactions. Using your two data frames:

df.combined <- with(c(df1, df2), data.frame(df1, df2, V1:V3, V1:V4, V2:V3, V2:V4))

which gives

  V1 V2 V3 V4 V1.V3 V1.V4 V2.V3 V2.V4
1  a  c  e  g   a:e   a:g   c:e   c:g
2  b  d  f  h   b:f   b:h   d:f   d:h

If you need the colons in the names, another oneliner will change periods to colons:

colnames(df.combined) <- gsub("\\.", ":", colnames(df.combined))

leaving the final results

  V1 V2 V3 V4 V1:V3 V1:V4 V2:V3 V2:V4
1  a  c  e  g   a:e   a:g   c:e   c:g
2  b  d  f  h   b:f   b:h   d:f   d:h

Upvotes: 0

IRTFM
IRTFM

Reputation: 263332

I'm not usre it this meet your definition of "succintly".

dfc <- cbind(df1,df2)
dfc2<- cbind( dfc, `V1:V3`=interaction(dfc$V1, dfc$V3, sep=":"), 
                   `V1:V4`=interaction(dfc$V1,dfc$V4, sep=":") )
df.combined <- cbind( dfc2, `V2:V3`=interaction(dfc$V2, dfc$V3, sep=":"), 
                            `V2:V4`=interaction(dfc$V2,dfc$V4, sep=":") )
> df.combined
  V1 V2 V3 V4 V1:V3 V1:V4 V2:V3 V2:V4
1  a  c  e  g   a:e   a:g   c:e   c:g
2  b  d  f  h   b:f   b:h   d:f   d:h

(It is generally not recommended to have colons in variable names. They will then always need to be quoted.

Upvotes: 2

bdemarest
bdemarest

Reputation: 14667

Here is one solution. Maybe not terribly elegant or succinct, but possibly useful...

dat <- data.frame(V1=c("a", "b"),
                  V2=c("c", "d"),
                  V3=c("e", "f"), 
                  V4=c("g", "h"))

factor_pairs <- expand.grid(c("V1", "V2"), 
                            c("V3", "V4"),
                            stringsAsFactors=FALSE)

for (i in 1:nrow(factor_pairs)) {
    factor_1 <- factor_pairs[i, 1]
    factor_2 <- factor_pairs[i, 2]
    new_col_name <- paste(factor_1, factor_2, sep=":")
    dat[[new_col_name]] <- paste(dat[[factor_1]], dat[[factor_2]], sep=":")
}

dat
#   V1 V2 V3 V4 V1:V3 V2:V3 V1:V4 V2:V4
# 1  a  c  e  g   a:e   c:e   a:g   c:g
# 2  b  d  f  h   b:f   d:f   b:h   d:h

Upvotes: 0

Related Questions