fleems
fleems

Reputation: 109

Combining multiple conditions from multiple columns in to a new column

I can't seem to get my code to work when I want to create a new column with a single integer from multiple conditions from multiple columns.

I have 4 vars: pun1, pun2, pun3, pun4. I want to transform these rows depending on the conditions to a new column ( pun_severity_out ) and give it a 1 if all conditions are true. Whenever a condition is not true, the integer should change to pun_severity_out = 0

The thing here is that pun1 and pun2 are grouped together, and so are pun3 and pun4.

Whenever there's a NA, this means that that person has been evaluated by others ( you can't punish yourself ). Since these Ss are grouped, we have an ingroup and outgroup. So if pun1 == NA, this means that the outgroup is pun3 & pun4. For clarity, if pun3 == NA, then the outgroup is pun1 & pun2.

What I want to accomplish is that all values of 4 or higher for both outgroup members to be merged into a single value, 1. But only if there's an NA present in the other group, because we specifically want outgroup members.

Edit: sample data

   UniqueSS subject group       part round  treatment pun1 pun2 pun3 pun4 severity_pun_out
1        11       1     1 punishment     0 homogenous   NA    0    0    0                0
2        12       2     1 punishment     0 homogenous    0   NA    0    0                0
3        13       3     1 punishment     0 homogenous    0    0   NA    0                0
4        14       4     1 punishment     0 homogenous    0    0    1   NA                0
5        11       1     1 punishment     1 homogenous   NA    0    0    0                0
6        12       2     1 punishment     1 homogenous    0   NA    0    0                0
7        13       3     1 punishment     1 homogenous    0    0   NA    0                0
8        14       4     1 punishment     1 homogenous    0    0    0   NA                0
9        11       1     1 punishment     2 homogenous   NA    0    0    0                0
10       12       2     1 punishment     2 homogenous    0   NA    5    4                1

My best attempt is this, but this gives NAs when using more ifelse() inside the same statement:

df5$severity_pun_out <- with(df5, ifelse(pun1 == NA & pun3 >= 4 & pun4 >= 4, 1, ifelse(pun2 == NA & pun3 >= 4 & pun4 >= 4, 1, ifelse(pun3 == NA & pun1 >= 4 & pun2 >= 4, 1, ifelse(pun4 == NA & pun1 >= 4 & pun2 >= 4, 1, 0 )))))

1) If pun1 == NA then pun3 & pun4 is the outgroup.

2) Then if pun3 & pun4 have their values equal or higher than 4, put down a 1 in that row for the (new) pun_severity_out column.

I think the NAs are causing some of the ruckus, but it's just a condition to be met. I am not sure how to solve this, because I am just calling for a 1 not a transformation of any NAs.

Should I call the specific row with that specific NA and then apply the outgroup transformation? I am assuming that's what I am doing with ifelse(), because we specifically use the row with that specific NA.

The code (or function) is preferably short, simple and general applicable and does not interact with the dataset (except for possibly making pun_severity_out column ). I might want to change the cut-off value to 3, so altering the code shouldn't be more than changing a value.

I don't often use dplyr, but if it's that much better/easier/faster I'll use it.

Additional question

Bonus points if you can single out the ingroup pun(X) from the 4 variables and put its integer in to a new column called pun_severity_in. As in, if pun1 == NA, add pun2 in the row of the pun_severity_in column.

Used sources

How can I create a column based on multiple conditions?

How do I create a new column based on multiple conditions from multiple columns?

https://stats.stackexchange.com/questions/115162/filtering-a-dataframe-in-r-based-on-multiple-conditions

Thanks in advance

Upvotes: 0

Views: 274

Answers (1)

Gregor Thomas
Gregor Thomas

Reputation: 145765

You can't use == for testing NA, you'll just get NA back. Use is.na instead. Try this:

df5$severity_pun_out <-
  with(df5, ifelse(
    is.na(pun1) &
      pun3 >= 4 &
      pun4 >= 4,
    1,
    ifelse(
      is.na(pun2) &
        pun3 >= 4 &
        pun4 >= 4,
      1,
      ifelse(
        is.na(pun3) &
          pun1 >= 4 &
          pun2 >= 4,
        1,
        ifelse(is.na(pun4) &
                 pun1 >= 4 &
                 pun2 >= 4, 1, 0
        )
      )
    )
  )
)

A simpler alternative would be to combine the paired is.na conditions with |, like this:

df5$severity_pun_out <-
  with(df5, ifelse(
    (is.na(pun1) | is.na(pun2)) &
      pun3 >= 4 &
      pun4 >= 4,
    1,
    ifelse((is.na(pun3) | is.na(pun4)) &
             pun1 >= 4 &
             pun2 >= 4,
           1, 0)
  ))

In dplyr, you could use case_when which can be simpler than ifelse, but is a matter of style.

Upvotes: 1

Related Questions