iukie
iukie

Reputation: 82

R: Row wise operation on same column to produce N more columns

I'm trying to come up with multiple columns based on a column (path) in data table. My dataset is currently 6 millions rows.

EDIT: Included minimal dataset which is naive

set.seed(24);
DATA <- data.frame(path=paste0(sample(LETTERS[1:3], 25, replace=TRUE),">",sample(LETTERS[1:3], 25, replace=TRUE)), value=rnorm(25))

Following is code that I'm currently using (of course I'm not grepl'ing alphabets):

for (i in 1:nrow(DATA)) {
  if(grepl("A", DATA$path[i])){DATA$A[i]=1}else{DATA$A[i]=0}
  if(grepl("B", DATA$path[i])){DATA$B[i]=1}else{DATA$B[i]=0}
  if(grepl("C", DATA$path[i])){DATA$C[i]=1}else{DATA$C[i]=0}
}

An older version of the code I was using is:

DATA$A <- sapply(DATA$path, function(x) { if(grepl("A", x)){1}else{0}})
DATA$B <- sapply(DATA$path, function(x) { if(grepl("B", x)){1}else{0}})

for every column that I want to output.

but this is very inefficient as it sapply too many times.

My question is: is there a more efficient way to do the same thing? or am I stuck with the first code chunk?

Thanks in advance!

Upvotes: 1

Views: 81

Answers (2)

akrun
akrun

Reputation: 886938

We could split the 'path' column with > and get the unique elements in that column ('Un1'). Loop over 'Un1' and use grepl to find the match.

Un1 <- sort(unique(unlist(strsplit(as.character(DATA$path), '>'))))
DATA[Un1] <- lapply(Un1, function(x) as.integer(grepl(x, DATA$path)))

Or another option is using mtabulate from qdapTools on the strsplit output, negate (!) to convert 0 to 'TRUE' and all other values to 'FALSE', negate again so that 'FALSE' becomes 'TRUE' and viceversa, wrap it with +( to coerce the logical to integer. We can also use as.integer or +0L etc for this.

library(qdapTools)
cbind(DATA, +(!!mtabulate(strsplit(as.character(DATA$path), '>')) ))

Or after splitting the 'path' column, we can loop over the columns, apply model.matrix and use | with Reduce

 d1 <- do.call(rbind.data.frame,strsplit(as.character(DATA$path), '>'))
 names(d1) <- paste0('path', 1:2)
 cbind(DATA, +(Reduce(`|`,lapply(d1, function(x) model.matrix(~0+x)))))

Upvotes: 2

Colonel Beauvel
Colonel Beauvel

Reputation: 31161

Why not simply:

DATA$A = grepl('A', DATA$path) + 0L

Upvotes: 1

Related Questions