Reputation: 17
I am comparing scores from specific columns in Site_Compile:
tail(Site_Compile)
Node_10_State Node_11_State Node_12_State
19231 0 1 -
19232 1 1 1
19233 0 0 0
19234 0 0 0
19235 0 0 0
19236 1 1 1
using the following block per new column:
Gain_Loss_Track$Node2_vs_Node3 <- ifelse(Site_Compile$Node_2_State == '-' | Site_Compile$Node_3_State == '-', '?',
ifelse(Site_Compile$Node_2_State < Site_Compile$Node_3_State, 'Loss',
ifelse(Site_Compile$Node_2_State > Site_Compile$Node_3_State, 'Gain',
ifelse(Site_Compile$Node_2_State == Site_Compile$Node_3_State, 'Same', 'Other'))))
to generate a new dataframe which looks like this:
tail(Gain_Loss_Track)
X_vs_Node1 Node1_vs_Node2 Y_vs_Node2 Node2_vs_Node3
19231 Same Same Same Same
19232 Same Same Same Same
19233 Loss Same Same Same
19234 Loss Same Same Same
19235 Same Same Same Same
19236 Same Same Same Same
I would love to loop this to save space (and room for error) but need to only compare specific pairs of columns from the dataset, I don't want to loop through making all comparisons. The pairs aren't always neat (I couldn't automate with comparing i and i+1 pairs or any regular pattern, I have to list the pairs to compare somehow).
I think I should be able to do this with a for loop using some custom function but am stuck on how to loop with many paired variables.
Upvotes: 0
Views: 92
Reputation: 1075
A data.table
solution using fcase()
.
library(data.table)
setDT(Site_Compile)
compare_columns <- function(x,y){
fcase(x == '-' | y == '-', "?",
x < y, 'Loss',
x > y, 'Gain',
x == y, 'Same',
default = 'Other')
}
Site_Compile[, Node2_vs_Node3 := compare_columns(Node_2_State, Node_3_State)]
And then for multiple columns, you could do the following, modifying compare_columns()
slightly:
compare_columns <- function(z, DT){
x <- z[1]; y <- z[2]
fcase(DT[[x]] == '-' | DT[[y]] == '-', "?",
DT[[x]] < DT[[y]], 'Loss',
DT[[x]] > DT[[y]], 'Gain',
DT[[x]] == DT[[y]], 'Same',
default = 'Other')
}
new_col_names <- paste0("Node",1:10,"_vs_","Node",11:20)
data_column_pairs <- list(c("first_var1","first_var2"),
c("second_var1","second_var2"), ...)
Site_Compile[, (new_col_names) :=
lapply(data_column_pairs, compare_columns, DT = Site_Compile)]
For example, on the mtcars
dataset,
data(mtcars)
library(data.table)
setDT(mtcars)
data_column_pairs <- list(c("gear","carb"),
c("drat", "wt"),
c("mpg","qsec"))
new_col_names <- c("first","second","third")
mtcars[, (new_col_names) := lapply(data_column_pairs, compare_columns, DT = mtcars)]
mtcars
mpg cyl disp hp drat wt qsec vs am gear carb first second third
1: 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 Same Gain Gain
2: 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 Same Gain Gain
3: 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 Gain Gain Gain
4: 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 Gain Loss Gain
5: 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 Gain Loss Gain
6: 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 Gain Loss Loss
7: 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 Loss Loss Loss
8: 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 Gain Gain Gain
9: 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 Gain Gain Loss
There is probably a nicer way to define compare_columns()
, but I played around with it a bit and this is the first way I could get it to work.
Upvotes: 1