Reputation: 393
I have a huge file where I want to create a column based on other columns. My file look like this:
person = c(1,2,3,4,5,6,7,8)
father = c(0,0,1,1,4,5,5,7)
mother = c(0,0,2,3,2,2,6,6)
ped = data.frame(person,father,mother)
And I want to create a column indicating if the person is a father or mother (gender column). I got it using a for loop in a small example, but when I apply in the whole file it takes hours to finish. How can I create an apply function to solve that, please. Thanks.
for(i in 1:nrow(ped)){
ped$test[i] = ifelse(ped[i,1] %in% ped[,2], "M", ifelse(ped[i,1] %in% ped[,3], "F", NA))
}
Upvotes: 3
Views: 1423
Reputation: 93813
Without having to specify each "father" / "mother" / etc option in code, you could do:
vars <- c("father","mother")
factor(
do.call(pmax, Map(function(x,y) (ped$person %in% x) * y, ped[vars], seq_along(vars) )),
labels=c(NA,"M","F")
)
#[1] M F F M M F M <NA>
#Levels: <NA> M F
Upvotes: 2
Reputation: 886938
You could try
ped$gender <- c(NA, 'M', 'F')[as.numeric(factor(with(ped,
1+2*person %in% father + 4*person %in% mother)))]
Or a faster option would be to assign :=
with data.table
library(data.table)
setDT(ped)[person %in% father, gender:='M'][person %in% mother, gender:='F']
Upvotes: 3
Reputation: 1281
Try this:
ped <- transform(ped, gender = ifelse(person %in% father,
'M',
ifelse(person %in% mother, 'F', NA)
))
Instead of looping over the individual values across the rows, this uses vectorization.
Upvotes: 3