Amanda
Amanda

Reputation: 17

R: Looping custom function with paired variables?

I am comparing scores from specific columns in Site_Compile:

tail(Site_Compile)
       Node_10_State Node_11_State Node_12_State 
19231             0             1             -             
19232             1             1             1             
19233             0             0             0             
19234             0             0             0             
19235             0             0             0            
19236             1             1             1 

using the following block per new column:

Gain_Loss_Track$Node2_vs_Node3 <- ifelse(Site_Compile$Node_2_State == '-' | Site_Compile$Node_3_State == '-', '?',
                                    ifelse(Site_Compile$Node_2_State < Site_Compile$Node_3_State, 'Loss',
                                        ifelse(Site_Compile$Node_2_State > Site_Compile$Node_3_State, 'Gain',
                                            ifelse(Site_Compile$Node_2_State == Site_Compile$Node_3_State, 'Same', 'Other'))))

to generate a new dataframe which looks like this:

tail(Gain_Loss_Track)
             X_vs_Node1  Node1_vs_Node2   Y_vs_Node2    Node2_vs_Node3  
19231              Same           Same        Same           Same      
19232              Same           Same        Same           Same      
19233              Loss           Same        Same           Same      
19234              Loss           Same        Same           Same      
19235              Same           Same        Same           Same      
19236              Same           Same        Same           Same          

I would love to loop this to save space (and room for error) but need to only compare specific pairs of columns from the dataset, I don't want to loop through making all comparisons. The pairs aren't always neat (I couldn't automate with comparing i and i+1 pairs or any regular pattern, I have to list the pairs to compare somehow).

I think I should be able to do this with a for loop using some custom function but am stuck on how to loop with many paired variables.

Upvotes: 0

Views: 92

Answers (1)

diomedesdata
diomedesdata

Reputation: 1075

A data.table solution using fcase().

For one pair of columns

library(data.table)

setDT(Site_Compile)

compare_columns <- function(x,y){
  fcase(x == '-' | y == '-', "?",
        x < y, 'Loss',
        x > y, 'Gain',
        x == y, 'Same', 
        default = 'Other')
}

Site_Compile[, Node2_vs_Node3 := compare_columns(Node_2_State, Node_3_State)]

For multiple pairs of columns

And then for multiple columns, you could do the following, modifying compare_columns() slightly:

compare_columns <- function(z, DT){
  x <- z[1]; y <- z[2]
  fcase(DT[[x]] == '-' | DT[[y]] == '-', "?",
        DT[[x]] < DT[[y]], 'Loss',
        DT[[x]] > DT[[y]], 'Gain',
        DT[[x]] == DT[[y]], 'Same', 
        default = 'Other')
}

new_col_names <- paste0("Node",1:10,"_vs_","Node",11:20)

data_column_pairs <- list(c("first_var1","first_var2"),
  c("second_var1","second_var2"), ...)

Site_Compile[, (new_col_names) := 
  lapply(data_column_pairs, compare_columns, DT = Site_Compile)]

For example, on the mtcars dataset,

data(mtcars)

library(data.table)
setDT(mtcars)

data_column_pairs <- list(c("gear","carb"),
                          c("drat", "wt"),
                          c("mpg","qsec"))

new_col_names <- c("first","second","third")

mtcars[, (new_col_names) := lapply(data_column_pairs, compare_columns, DT = mtcars)]

mtcars
     mpg cyl  disp  hp drat    wt  qsec vs am gear carb first second third
 1: 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4  Same   Gain  Gain
 2: 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4  Same   Gain  Gain
 3: 22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1  Gain   Gain  Gain
 4: 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1  Gain   Loss  Gain
 5: 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2  Gain   Loss  Gain
 6: 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1  Gain   Loss  Loss
 7: 14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4  Loss   Loss  Loss
 8: 24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2  Gain   Gain  Gain
 9: 22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2  Gain   Gain  Loss

There is probably a nicer way to define compare_columns(), but I played around with it a bit and this is the first way I could get it to work.

Upvotes: 1

Related Questions