Reputation: 1043
I have a sparse data table that looks like this:
data = data.table(
var1 = c("a","",""),
var2 = c("","","c"),
var3 = c("a","b",""),
var4 = c("","b","")
)
var1 var2 var3 var4
1: a a
2: b b
3: c
I would like to add a column that contains a string of zeros and ones indicating which variables are present in any row, like this:
var1 var2 var3 var4 concat
1: a a 1|0|1|0
2: b b 0|0|1|1
3: c 0|1|0|0
I can get to this with the following command:
data[, concat := paste(
as.integer(var1 != ""),
as.integer(var2 != ""),
as.integer(var3 != ""),
as.integer(var4 != ""),
sep = "|")]
However, if I have hundreds of variables, I would rather use some sort of computation to get to the desired expression. Perhaps something based on paste0("var",1:4)
, or at least a vector of column names. Any suggestions?
Upvotes: 5
Views: 186
Reputation: 1043
Thanks for the many and so diverse solutions. I am impressed!
I did some benchmarking on my large dataset to compare running time of several of the different approaches. Here is what I found:
data[ , concat := apply(.SD, 1, function(x) paste(+(x == ""), collapse = "|"))]
Time: 6 min, 41 s
data[, concat := do.call(paste, c(lapply(.SD, function(x) (x!="")+0 ), sep="|")) ]
Time: 10 min, 26 s
data[,concat := paste0(as.integer(.SD != ""), collapse = "|"), by = 1:nrow(data)]
Time: > 40 min (manually killed)
Upvotes: 2
Reputation: 93813
Variation not requiring any grouping by each row or apply
-ing on each row.
data[, concat := do.call(paste, c(lapply(.SD, function(x) (x!="")+0 ), sep="|")) ]
# var1 var2 var3 var4 concat
#1: a a 1|0|1|0
#2: b b 0|0|1|1
#3: c 0|1|0|0
Upvotes: 4
Reputation: 214957
Another option is to group the data by row and paste each row together:
data[,concat := paste0(as.integer(.SD != ""), collapse = "|"), by = 1:nrow(data)]
data
# var1 var2 var3 var4 concat
#1: a a 1|0|1|0
#2: b b 0|0|1|1
#3: c 0|1|0|0
Upvotes: 1
Reputation: 34703
Same basic approach as above:
data[ , concat := apply(.SD, 1, function(x) paste(+(x == ""), collapse = "|"))][]
# var1 var2 var3 var4 concat
# 1: a a 0|1|0|1
# 2: b b 1|1|0|0
# 3: c 1|0|1|1
Upvotes: 4
Reputation: 3288
data$concat <- apply(apply(data, 2, function(x) ifelse(x == "", 0, 1)), 1, function(x) paste(x, collapse="|"))
Breakdown:
1) For each column in data
, check if element is empty, if so return 0, else 1
apply(data, 2, function(x) ifelse(x == "", 0, 1))
Let's call the return from (1) the variable concat
. For each row of concat
, paste everything together with a pipe (|
) separating them. Set the new column of data
to equal this.
apply(concat, 1, function(x) paste(x, collapse="|"))
Upvotes: 2