Reputation: 313
I have a set of five columns in a data table.
dt <- data.table(
city = c(rep(1,2), rep(2,2), rep(3,2), rep(4,2)),
neighborhoods.1 = c(NA, "a", "b", "c", NA, NA, "d", "e"),
neighborhoods.2 = c(NA, "f", "g", rep(NA,5)),
neighborhoods.3 = c(NA, "h", rep(NA, 6)),
irrelevantdata = c(1:8)
)
city neighborhoods.1 neighborhoods.2 neighborhoods.3 irrelevantdata
1: 1 NA NA NA 1
2: 1 a f h 2
3: 2 b g NA 3
4: 2 c NA NA 4
5: 3 NA NA NA 5
6: 3 NA NA NA 6
7: 4 d NA NA 7
8: 4 e NA NA 8
I want to combine the first four columns into a single column.
neighborhood
1: 1
2: 1-a-f-h
3: 2-b-g
4: 2-c
5: 3
6: 3
7: 4-d
8: 4-e
As you can see, I'm removing NA
records and separating with a -
.
I've tried this, which has obvious problems in processing j
:
business[
,
neighborhood = paste0(
city,
if(!is.na(neighborhoods.1)) paste0("-", neighborhoods.1),
if(!is.na(neighborhoods.2)) paste0("-", neighborhoods.2),
if(!is.na(neighborhoods.3)) paste0("-", neighborhoods.3),
""
)
]
How can I get this done?
Updated to reflect that there are additional columns I don't want to combine.
Upvotes: 4
Views: 297
Reputation: 887118
One option would be to paste
the elements in rows together using do.call
, and then remove the NA
elements along with the extra -
in the output vector.
dt[,.(neighborhood = gsub('-NA|NA-', '',
do.call(paste, c(.SD, sep='-')))), .SDcols= city:neighborhoods.3]
Or another option is group by sequence of rows, unlist
the Subset of Data.table (.SD
), remove the NA elements (na.omit
), paste
the elements together. We can specify the columns to be used for this operation in .SDcols
.
dt[, .(neighbourhood = paste(na.omit(unlist(.SD)),collapse='-')) ,
by=1:nrow(dt), .SDcols= city:neighborhoods.3]
Or another option suggested by @Frank is to melt
the subset of dataset (specified by the columns that are needed) to long format and then paste
mycols <- setdiff(names(dt), 'irrelevantdata')
na.omit(melt(dt[, ..mycols][, r := .I],
id.var="r"))[, paste(value, collapse="-"), by=r]
Upvotes: 5