Reputation: 18830
Trying to replace demographic values by assigning them randomly.
I can obtain empty gender data rows by carrying out following:
df$gender[df$gender == "",]
user_id, name, age, gender
001, xyz, 23,
004, abc, 32,
I want to assign gender randomly:
sample(c('male', 'female'), nrow(df$gender[df$gender == ""]), replace=TRUE, prob=c(0.5, 0.5))
tried following:
df$gender[df$gender == ""] <- sample(c('male', 'female'), nrow(df$gender[df$gender == ""]), replace=TRUE, prob=c(0.5, 0.5))
This only assigned to few cells but not all.
Upvotes: 3
Views: 569
Reputation: 248
Using the following example:
user_id <- c(1:5)
name <- c("a","b","c","d","e")
age <- c(20,23,44,21,32)
gender <- c("m","f","","", "m")
df <- data.frame(user_id,
name,
age,
gender,
stringsAsFactors = FALSE)
I suggest creating a vector of length nrow:
rand_gender <- sample(c('m', 'f'), nrow(df), replace=TRUE, prob=c(0.5, 0.5))
And only replacing in case "gender" is empty:
df$gender <- ifelse(df$gender=="", rand_gender, df$gender)
Upvotes: 3
Reputation: 3722
You should use length
. df$gender[df$gender == ""]
returns a vector since you're subsetting df$gender
. You also don't need probs = c(0.5, 0.5)
as sample
by default will use 50/50 since you're only giving it two options. You would use probs
if you wanted it to be a 70/30 split for male/female.
df$gender[df$gender == ""] <- sample(c('male', 'female'), length(df$gender[df$gender == ""]), replace=TRUE)
Upvotes: 1