pha
pha

Reputation: 356

How to assign values to a new column in R depending on values from other columns

Creating a new column in my data frame of > 900,000 rows to which I want values to be

  1. NA if any of the values in a set of columns are NA
df$newcol[is.na(df$somecol_1) | is.na(df$somecol_2) | is.na(df$somecol_3)] <- NA
  1. 0 if all of the values in a set of columns are 0
df$newcol[df$somecol_1==0 & df$somecol_2==0 & df$somecol_3==0] <- 0
  1. 1 if any of the values in a set of columns are 1 while none is NA. This is the tricky part as it creates a myriad of combinations with my ten columns. The whole data frame has >50 columns, of which I have ten columns of interest for this procedure and here I present only three:
df$newcol[df$somecol_1==1 & df$somecol_2==0 & df$somecol_3==0] <- 1
df$newcol[df$somecol_1==1 & df$somecol_2==1 & df$somecol_3==0] <- 1
df$newcol[df$somecol_1==1 & df$somecol_2==1 & df$somecol_3==1] <- 1
df$newcol[df$somecol_1==0 & df$somecol_2==1 & df$somecol_3==0] <- 1
df$newcol[df$somecol_1==0 & df$somecol_2==1 & df$somecol_3==1] <- 1
df$newcol[df$somecol_1==0 & df$somecol_2==0 & df$somecol_3==1] <- 1
df$newcol[df$somecol_1==1 & df$somecol_2==0 & df$somecol_3==1] <- 1

I have a feeling I am overthinking this, there must be a way to make 3 easier? Writing different combinations of columns as shown above would take forever with ten. And a loop would go too slow due to the large dataset.

Dummy data:

df <- NULL
df$somecol_1 <- c(1,0,0,NA,0,1,0,NA,1,1)
df$somecol_2 <- c(NA,1,0,0,0,1,0,NA,0,0)
df$somecol_3 <- c(0,0,0,0,0,0,0,0,0,0)
df <- as.data.frame(df)

Based on the above, I want the new column to be

df$newcol <- c(NA,1,0,NA,0,1,0,NA,1,1)

Upvotes: 1

Views: 1413

Answers (3)

Ricardo Semi&#227;o
Ricardo Semi&#227;o

Reputation: 4456

df$newcol = ifelse(apply(df,1,sum)>=1,1,0)

Should do the trick. First, apply will sum every row, then: Whenever there is NA values in a row (1st case), any operation will return NA; When there are only 0's (2nd case), the sum is 0 (witch isn't >=1)and the ifelse third argument makes the new entry 0; And when there is at least one 1, the sum is equal or greater than 1 and the ifelse second argument makes the new entry 1.

Edit: As you want to run theese conditions only in some columns - say its columns 1-7, 9, and 23-24 - you can use the code only in that part of the df:

df$newcol = as.numeric(rowSums(df[,c(1:7,9,23:24)])>=1)

OBS: i used the simplified code that Akrun and Gregor have answered.

If you prefer, there are ways to select columns by name: Extracting specific columns from a data frame

Upvotes: 2

akrun
akrun

Reputation: 887951

We can use rowSums

nm1 <- grep('somecol', names(df))
df$newcol <- NA^(rowSums(is.na(df[nm1])) > 0) *(rowSums(df[nm1], na.rm = TRUE) > 0)
df$newcol
#[1] NA  1  0 NA  0  1  0 NA  1  1

data

df <- structure(list(somecol_1 = c(1, 0, 0, NA, 0, 1, 0, NA, 1, 1), 
    somecol_2 = c(NA, 1, 0, 0, 0, 1, 0, NA, 0, 0), somecol_3 = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0)), class = "data.frame", row.names = c(NA, 
-10L))

Upvotes: 2

Gregor Thomas
Gregor Thomas

Reputation: 146224

as.numeric(rowSums(df) >= 1) 
#[1] NA  1  0 NA  0  1  0 NA  1  1

rowSums will give NA if there are any missing values. It will be 0 if all the values are 0, it will be 1 otherwise (assuming your data is all either NA, 0, or 1).

(Using akrun's sample data)

Upvotes: 2

Related Questions