mgtrek
mgtrek

Reputation: 25

Maximum values for two columns, need to use loops?

I've been trying to use loops but they just aren't working, since I haven't done anything with this complexity before and I have been trying to solve this with dplyer, which has been an impossible exercise. I don't know what to do. Can anyone suggest any solutions?

dat <- read.table(text = "ID    A_1   A_2    B_1   B_2    C_1   C_2    D_1   D_2    E_1   E_2    F_1   F_2    G_1   G_2
 11      1       2       3       4       3       4       3       4       3       4       3       4       3       4
 32      5       6       7       8       6       7       6       7       6       7       6       7       6       7
 73      15       15       10       10       3       4        3       4        3       4        3       4       2       2
 84      13       13       15       15       4       4        3       4        3       4        3       4       2       2
 65      2       2       2       2      2       2       2       2      2       2       2       1      2       2
                  ", header = TRUE)

The above was easy enough with dplyer conditioning. However, the below was impossible to do.

And then

Any guidance would be appreciated!

EDIT: Expected output The yellow is the terminal comparison, the rest of the row whether column _1 or _2 is selected is based on the yellow item was _1 or _2

The yellow is the terminal comparison, the rest of the row whether column _1 or _2 is selected is based on the yellow item was _1 or _2

Upvotes: 0

Views: 103

Answers (1)

econometrica_33
econometrica_33

Reputation: 83

Have you considered using case_when() in tidyverse? It allows for complex vectorized conditional statements. For example, although we do not have your expected output, this code structure could help you achieve some of your goals.

> dat <- dat %>% mutate(test = case_when(
+     A_1 == A_2 ~ case_when( #If A_1 and A_2 are the same, then check...
+         B_1 == B_2 ~ "wow",  #Whether B_1 and B_2 are the same and do "wow"
+         B_1 != B_2 ~ "cool"  #Whether they are different, and do "cool"
+     ),
+     A_1 != A_2 ~ "diff" #If the A's are not the same, define "diff"
+ ))

To understand what is going on here, the first case_when() is checking whether A_1 is the same as A_2 (for each row, it does it as a vector). If it is, then it moves to the next case_when() and checks whether B_1 is equal to B_2, or if they are different. You can see the output here. I defined a new column test based on these nested, vectorizable conditions, removing the need for for loops.

  ID A_1 A_2 B_1 B_2 C_1 C_2 D_1 D_2 E_1 E_2 F_1 F_2 G_1 G_2  test
1 11   1   2   3   4   3   4   3   4   3   4   3   4   3   4  diff
2 32   5   6   7   8   6   7   6   7   6   7   6   7   6   7  diff
3 73  15  15  10  10   3   4   3   4   3   4   3   4   2   2   wow
4 84  13  13  15  15   4   4   3   4   3   4   3   4   2   2   wow
5 65   2   2   2   2   2   2   2   2   2   2   2   1   2   2   wow

For your third and fourth bullet points under the things you wish to do but have struggled doing, you can easily do these nested structures with the mutate() function to define these new columns as you wish, as I have done.

That said, for the further iterations of C, D, and so on, you'll have to develop quite a nested conditionality here. There might be a more elegant way to solve this, but this approach should generally work. You could just further add more layers of case_when(). So, in my example code, starting at line 3, it might look like

B_1 == B_2 ~ case_when(
     C_1 == C_2 ~ case_when(
         ....
     ),
B_1 > B_2 ~ B_1, #New col's rows meeting this condition are assigned B_1
B_2 > B_1 ~ B_2 #These last two are my attempt to do your 3rd bullet point

I'm not exactly sure what you mean by "largest name" though -- do your columns contain both character data and numerics that are currently held as characters?

Upvotes: 1

Related Questions