Reputation: 25
I've been trying to use loops but they just aren't working, since I haven't done anything with this complexity before and I have been trying to solve this with dplyer, which has been an impossible exercise. I don't know what to do. Can anyone suggest any solutions?
dat <- read.table(text = "ID A_1 A_2 B_1 B_2 C_1 C_2 D_1 D_2 E_1 E_2 F_1 F_2 G_1 G_2
11 1 2 3 4 3 4 3 4 3 4 3 4 3 4
32 5 6 7 8 6 7 6 7 6 7 6 7 6 7
73 15 15 10 10 3 4 3 4 3 4 3 4 2 2
84 13 13 15 15 4 4 3 4 3 4 3 4 2 2
65 2 2 2 2 2 2 2 2 2 2 2 1 2 2
", header = TRUE)
The above was easy enough with dplyer conditioning. However, the below was impossible to do.
And then
Any guidance would be appreciated!
The yellow is the terminal comparison, the rest of the row whether column _1 or _2 is selected is based on the yellow item was _1 or _2
Upvotes: 0
Views: 103
Reputation: 83
Have you considered using case_when()
in tidyverse
? It allows for complex vectorized conditional statements. For example, although we do not have your expected output, this code structure could help you achieve some of your goals.
> dat <- dat %>% mutate(test = case_when(
+ A_1 == A_2 ~ case_when( #If A_1 and A_2 are the same, then check...
+ B_1 == B_2 ~ "wow", #Whether B_1 and B_2 are the same and do "wow"
+ B_1 != B_2 ~ "cool" #Whether they are different, and do "cool"
+ ),
+ A_1 != A_2 ~ "diff" #If the A's are not the same, define "diff"
+ ))
To understand what is going on here, the first case_when()
is checking whether A_1 is the same as A_2 (for each row, it does it as a vector). If it is, then it moves to the next case_when()
and checks whether B_1 is equal to B_2, or if they are different. You can see the output here. I defined a new column test
based on these nested, vectorizable conditions, removing the need for for loops.
ID A_1 A_2 B_1 B_2 C_1 C_2 D_1 D_2 E_1 E_2 F_1 F_2 G_1 G_2 test
1 11 1 2 3 4 3 4 3 4 3 4 3 4 3 4 diff
2 32 5 6 7 8 6 7 6 7 6 7 6 7 6 7 diff
3 73 15 15 10 10 3 4 3 4 3 4 3 4 2 2 wow
4 84 13 13 15 15 4 4 3 4 3 4 3 4 2 2 wow
5 65 2 2 2 2 2 2 2 2 2 2 2 1 2 2 wow
For your third and fourth bullet points under the things you wish to do but have struggled doing, you can easily do these nested structures with the mutate()
function to define these new columns as you wish, as I have done.
That said, for the further iterations of C, D, and so on, you'll have to develop quite a nested conditionality here. There might be a more elegant way to solve this, but this approach should generally work. You could just further add more layers of case_when()
. So, in my example code, starting at line 3, it might look like
B_1 == B_2 ~ case_when(
C_1 == C_2 ~ case_when(
....
),
B_1 > B_2 ~ B_1, #New col's rows meeting this condition are assigned B_1
B_2 > B_1 ~ B_2 #These last two are my attempt to do your 3rd bullet point
I'm not exactly sure what you mean by "largest name" though -- do your columns contain both character data and numerics that are currently held as characters?
Upvotes: 1