jamesguy0121
jamesguy0121

Reputation: 1254

Create new variable based on the value of several other variables

So I have a data set that has multiple variables that I want to use to create a new variable. I have seen other questions like this that use the ifelse statement, but this would be extremely insufficient since the new variable is based on 32 other variables. The variables are coded with values of 1, 2, 3, or NA, and I am wanting the new variable to be coded as 1 if 2 or more of the 32 variables take on a value of 1, and 2 otherwise. Here is a small example of what I have been trying to do.

df <- data.frame(id = 1:10, v1 = c(1,2,2,2,3,NA,2,2,2,2), v2 = c(2,2,2,2,2,1,2,1,2,2), 
             v3 = c(1,2,2,2,2,3,2,2,2,2), v4 = c(2,2,2,2,2,1,2,2,2,3))

and the result I am looking for is this:

   id v1 v2 v3 v4 new
1   1  1  2  1  2   1
2   2  2  2  2  2   2
3   3  2  2  2  2   2
4   4  2  2  2  2   2
5   5  3  2  2  2   1
6   6 NA  1  3  1   2
7   7  2  2  2  2   2
8   8  2  1  2  2   2
9   9  2  2  2  2   2
10 10  2  2  2  3   2

I have also tried using rowSums within the if else statement, but with the missing values this doesn't work for all observations unless I recode the NAs to another value which I want to avoid doing, and besides that I feel like there would be a much more efficient way of doing this.

I feel like it is likely that this question has been answered before, but I couldn't find anything on it. So help or direction to a previous answer would be appreciated.

Upvotes: 1

Views: 3581

Answers (1)

G106863
G106863

Reputation: 88

It looks like you were very close to getting your desired output, but you were probably missing the na.rm = TRUE argument as part of your rowSums() call. This will remove any NAs before rowSums does its calculations.

Anyway, using your data frame from above, I created a new variable that counts the number of times 1 appears across the variables, while ignoring NA values. Note that I've subsetted the data to exclude the id column:

df$count <- rowSums(df[-1] == 1, na.rm = TRUE)

Then I created another variable using an ifelse statement that returns a 1 if the count is 2 or more or a 2 otherwise.

df$var <- ifelse(df$count >= 2, 1, 2)

The returned output:

  id v1 v2 v3 v4 count var
1   1  1  2  1  2     2   1
2   2  2  2  2  2     0   2
3   3  2  2  2  2     0   2
4   4  2  2  2  2     0   2
5   5  3  2  2  2     0   2
6   6 NA  1  3  1     2   1
7   7  2  2  2  2     0   2
8   8  2  1  2  2     1   2
9   9  2  2  2  2     0   2
10 10  2  2  2  3     0   2

UPDATE / EDIT: As mentioned by Gregor in the comments, you can also just wrap the rowSums function in the ifelse statement for one line of code.

Upvotes: 3

Related Questions