user61395
user61395

Reputation: 15

Convert Character Matrix to TRUE/FALSE Matrix based on column names

I have a data frame in the following format

    1 2 a b c
1   a b 0 0 0
2   b   0 0 0
3   c   0 0 0

I want to fill columns a through c with a TRUE/FALSE that says whether the column name is in columns 1 or 2

    1 2 a b c
1   a b 1 1 0
2   b   0 1 0
3   c   0 0 1

I have a dataset of about 530,000 records, 4 description columns, and 95 output columns so a for loop does not work. I have tried code in the following format, but it was too time consuming:

> for(i in 3:5) {   
>   for(j in 1:3) {
>     for(k in 1:2){
>       if(df[j,k]==colnames(df)[i]) df[j, i]=1
>     }   
>   } 
> }

Is there an easier, more efficient way to achieve the same output?

Thanks in advance!

Upvotes: 1

Views: 118

Answers (1)

akrun
akrun

Reputation: 887048

One option is mtabulate from qdapTools

library(qdapTools)
df1[-(1:2)] <- mtabulate(as.data.frame(t(df1[1:2])))[-3]
df1
#  1 2 a b c
#1 a b 1 1 0
#2 b   0 1 0
#3 c   0 0 1

Or we melt the dataset after converting to matrix, use table to get the frequencies, and assign the output to the columns that are numeric.

library(reshape2)
df1[-(1:2)] <- table(melt(as.matrix(df1[1:2]))[-2])[,-1]

Or we can 'paste' the first two columns and use cSplit_e to get the binary format.

library(splitstackshape)
cbind(df1[1:2], cSplit_e(as.data.table(do.call(paste, df1[1:2])),
                   'V1', ' ', type='character', fill=0, drop=TRUE))

data

df1 <- structure(list(`1` = c("a", "b", "c"), `2` = c("b", "", ""), 
a = c(0L, 0L, 0L), b = c(0L, 0L, 0L), c = c(0L, 0L, 0L)), .Names = c("1", 
"2", "a", "b", "c"), class = "data.frame", row.names = c("1", 
"2", "3"))

Upvotes: 1

Related Questions