Antti
Antti

Reputation: 1293

How to build binary data.frame in R for multiple dimensions?

I have a dataframe with three factors of which two are binary and the third one is integer:

       DATA   YEAR1   YEAR2   REGION1   REGION2
OBS1   X      1        0      1         0  
OBS2   Y      1        0      0         1
OBS3   Z      0        1      1         0

etc.

Now I want to transform it to something like this

       YEAR1_REGION1   YEAR1_REGION2   YEAR2_REGION1   YEAR2_REGION2
OBS1   X               0               0               0
OBS2   0               Y               0               0
OBS3   0               0               Z               0

Basic matrix multiplication is not what I'm after. I would like to find a neat way to do this that would automatically have the columns renamed as well. My actual data has three factor dimensions with 20*8*6 observations so finally there will be 960 columns altogether.

Upvotes: 2

Views: 113

Answers (2)

dickoa
dickoa

Reputation: 18437

Here's another approach based on outer and similar to @Roland answer.

year <- grep("YEAR", names(DF), value = TRUE)
region <- grep("REGION", names(DF), value = TRUE)
data <- as.character(DF$DATA)

df <- outer(year, region, function(x, y) DF[,x] * DF[,y])
colnames(df) <- outer(year, region, paste, sep = "_")
df <- as.data.frame(df)

for (i in seq_len(ncol(df)))
    df[as.logical(df[,i]), i] <- data[as.logical(df[,i])]

df
##      YEAR1_REGION1 YEAR2_REGION1 YEAR1_REGION2 YEAR2_REGION2
## OBS1             X             0             0             0
## OBS2             0             0             Y             0
## OBS3             0             Z             0             0

Upvotes: 4

Roland
Roland

Reputation: 132626

Maybe others will come up with a more succinct possibility, but this creates the expected result:

DF <- read.table(text="       DATA   YEAR1   YEAR2   REGION1   REGION2
OBS1   X      1        0      1         0  
OBS2   Y      1        0      0         1
OBS3   Z      0        1      1         0", header=TRUE)

DF[,-1] <- lapply(DF[,-1], as.logical)
DF[,1] <- as.character(DF[,1])

res <- apply(expand.grid(2:3, 4:5), 1, function(i) {
  tmp <- rep("0", length(DF[,1]))
  ind <- do.call(`&`,DF[,i])
  tmp[ind] <- DF[ind,1]
  tmp <- list(tmp)
  names(tmp) <- paste0(names(DF)[i], collapse="_")
  tmp
})

res <- as.data.frame(res)
rownames(res) <- rownames(DF)


#      YEAR1_REGION1 YEAR2_REGION1 YEAR1_REGION2 YEAR2_REGION2
# OBS1             X             0             0             0
# OBS2             0             0             Y             0
# OBS3             0             Z             0             0

However, I suspect there is a much better possibility to achieve what you actually want to do, without creating a huge wide-format data.frame.

Upvotes: 4

Related Questions