Reputation: 1
I have a data table which contains one million records and I try to create a new column based on month.idx
:
dt[, new_col := get(paset0("month_",month.idx)]
and it only works for the first line.
Can anyone help me with this problem? thanks!
Data
id month_1 month_2 month_3 month_4 month_5 month.idx
1: x1 1 1 1 0 1 3
2: x2 0 0 0 1 0 4
3: x3 1 0 0 0 0 1
4: x4 0 0 0 0 0 5
5: x5 1 1 0 0 1 2
6: x6 0 1 0 1 1 3
7: x7 0 0 1 1 1 4
8: x8 0 0 0 0 0 1
9: x9 0 0 0 0 1 5
results:
id month_1 month_2 month_3 month_4 month_5 month.idx new_col
1: x1 1 1 1 0 1 3 1
2: x2 0 0 0 1 0 4 0
3: x3 1 0 0 0 0 1 0
4: x4 0 0 0 0 0 5 0
5: x5 1 1 0 0 1 2 0
6: x6 0 1 0 1 1 3 0
7: x7 0 0 1 1 1 4 1
8: x8 0 0 0 0 0 1 0
9: x9 0 0 0 0 1 5 0
expected:
id month_1 month_2 month_3 month_4 month_5 month.idx new_col
1: x1 1 1 1 0 1 3 1
2: x2 0 0 0 1 0 4 1
3: x3 1 0 0 0 0 1 1
4: x4 0 0 0 0 0 5 0
5: x5 1 1 0 0 1 2 1
6: x6 0 1 0 1 1 3 0
7: x7 0 0 1 1 1 4 0
8: x8 0 0 0 0 0 1 0
9: x9 0 0 0 0 1 5 1
Upvotes: 0
Views: 47
Reputation: 25225
Here are 2 options:
1) Using get
row by row taking in Frank’s comment:
DT[, new_col := get(paste0("month_", month.idx)), by= month.idx]
2) Melt and then join to do a lookup
DT[, variable := paste0("month_", month.idx)]
DT[melt(DT, id.vars="id", measure.vars=patterns("^month_")),
on=.(id, variable), new_col := value]
Speed is dependent on the number of rows & month columns that you have.
data:
DT <- fread("id month_1 month_2 month_3 month_4 month_5 month.idx
x1 1 1 1 0 1 3
x2 0 0 0 1 0 4
x3 1 0 0 0 0 1
x4 0 0 0 0 0 5
x5 1 1 0 0 1 2
x6 0 1 0 1 1 3
x7 0 0 1 1 1 4
x8 0 0 0 0 0 1")
Upvotes: 2