Reputation: 109
I can't seem to get my code to work when I want to create a new column with a single integer from multiple conditions from multiple columns.
I have 4 vars: pun1, pun2, pun3, pun4.
I want to transform these rows depending on the conditions to a new column ( pun_severity_out
) and give it a 1 if all conditions are true. Whenever a condition is not true, the integer should change to pun_severity_out = 0
The thing here is that pun1 and pun2 are grouped together, and so are pun3 and pun4.
Whenever there's a NA, this means that that person has been evaluated by others ( you can't punish yourself ).
Since these Ss are grouped, we have an ingroup and outgroup. So if pun1 == NA
, this means that the outgroup is pun3
& pun4
. For clarity, if pun3 == NA
, then the outgroup is pun1
& pun2
.
What I want to accomplish is that all values of 4 or higher for both outgroup members to be merged into a single value, 1. But only if there's an NA present in the other group, because we specifically want outgroup members.
Edit: sample data
UniqueSS subject group part round treatment pun1 pun2 pun3 pun4 severity_pun_out
1 11 1 1 punishment 0 homogenous NA 0 0 0 0
2 12 2 1 punishment 0 homogenous 0 NA 0 0 0
3 13 3 1 punishment 0 homogenous 0 0 NA 0 0
4 14 4 1 punishment 0 homogenous 0 0 1 NA 0
5 11 1 1 punishment 1 homogenous NA 0 0 0 0
6 12 2 1 punishment 1 homogenous 0 NA 0 0 0
7 13 3 1 punishment 1 homogenous 0 0 NA 0 0
8 14 4 1 punishment 1 homogenous 0 0 0 NA 0
9 11 1 1 punishment 2 homogenous NA 0 0 0 0
10 12 2 1 punishment 2 homogenous 0 NA 5 4 1
My best attempt is this, but this gives NAs when using more ifelse() inside the same statement:
df5$severity_pun_out <- with(df5, ifelse(pun1 == NA & pun3 >= 4 & pun4 >= 4, 1, ifelse(pun2 == NA & pun3 >= 4 & pun4 >= 4, 1, ifelse(pun3 == NA & pun1 >= 4 & pun2 >= 4, 1, ifelse(pun4 == NA & pun1 >= 4 & pun2 >= 4, 1, 0 )))))
1) If pun1 == NA
then pun3
& pun4
is the outgroup.
2) Then if pun3
& pun4
have their values equal or higher than 4, put down a 1 in that row for the (new) pun_severity_out
column.
I think the NAs are causing some of the ruckus, but it's just a condition to be met. I am not sure how to solve this, because I am just calling for a 1
not a transformation of any NAs.
Should I call the specific row with that specific NA and then apply the outgroup transformation? I am assuming that's what I am doing with ifelse(), because we specifically use the row with that specific NA.
The code (or function) is preferably short, simple and general applicable and does not interact with the dataset (except for possibly making pun_severity_out column ). I might want to change the cut-off value to 3, so altering the code shouldn't be more than changing a value.
I don't often use dplyr, but if it's that much better/easier/faster I'll use it.
Bonus points if you can single out the ingroup pun(X)
from the 4 variables and put its integer in to a new column called pun_severity_in
. As in, if pun1 == NA
, add pun2
in the row of the pun_severity_in
column.
How can I create a column based on multiple conditions?
How do I create a new column based on multiple conditions from multiple columns?
Thanks in advance
Upvotes: 0
Views: 274
Reputation: 145765
You can't use ==
for testing NA
, you'll just get NA
back. Use is.na
instead. Try this:
df5$severity_pun_out <-
with(df5, ifelse(
is.na(pun1) &
pun3 >= 4 &
pun4 >= 4,
1,
ifelse(
is.na(pun2) &
pun3 >= 4 &
pun4 >= 4,
1,
ifelse(
is.na(pun3) &
pun1 >= 4 &
pun2 >= 4,
1,
ifelse(is.na(pun4) &
pun1 >= 4 &
pun2 >= 4, 1, 0
)
)
)
)
)
A simpler alternative would be to combine the paired is.na
conditions with |
, like this:
df5$severity_pun_out <-
with(df5, ifelse(
(is.na(pun1) | is.na(pun2)) &
pun3 >= 4 &
pun4 >= 4,
1,
ifelse((is.na(pun3) | is.na(pun4)) &
pun1 >= 4 &
pun2 >= 4,
1, 0)
))
In dplyr
, you could use case_when
which can be simpler than ifelse
, but is a matter of style.
Upvotes: 1