PaulaF
PaulaF

Reputation: 393

Create new variable based on other columns using R

I have a huge file where I want to create a column based on other columns. My file look like this:

person = c(1,2,3,4,5,6,7,8)
father = c(0,0,1,1,4,5,5,7)
mother = c(0,0,2,3,2,2,6,6)
ped = data.frame(person,father,mother)

And I want to create a column indicating if the person is a father or mother (gender column). I got it using a for loop in a small example, but when I apply in the whole file it takes hours to finish. How can I create an apply function to solve that, please. Thanks.

for(i in 1:nrow(ped)){
  ped$test[i] = ifelse(ped[i,1] %in% ped[,2], "M", ifelse(ped[i,1] %in% ped[,3], "F", NA)) 
}

Upvotes: 3

Views: 1423

Answers (3)

thelatemail
thelatemail

Reputation: 93813

Without having to specify each "father" / "mother" / etc option in code, you could do:

vars <- c("father","mother")
factor(
  do.call(pmax, Map(function(x,y) (ped$person %in% x) * y, ped[vars], seq_along(vars) )),
  labels=c(NA,"M","F")
)
#[1] M    F    F    M    M    F    M    <NA>
#Levels: <NA> M F

Upvotes: 2

akrun
akrun

Reputation: 886938

You could try

ped$gender <- c(NA, 'M', 'F')[as.numeric(factor(with(ped, 
                  1+2*person %in% father + 4*person %in% mother)))]

Or a faster option would be to assign := with data.table

library(data.table)
setDT(ped)[person %in% father, gender:='M'][person %in% mother, gender:='F']

Upvotes: 3

B.Shankar
B.Shankar

Reputation: 1281

Try this:

ped <- transform(ped, gender = ifelse(person %in% father,
                                      'M',
                                      ifelse(person %in% mother, 'F', NA)
                                     ))

Instead of looping over the individual values across the rows, this uses vectorization.

Upvotes: 3

Related Questions