Reputation: 269
I'm looking for a good way to apply the gender function to a list of names (I'm pulling from XML), but I want to keep ALL the rows in order to join to additional data. Any suggestions on a good way to approach this?
Currently, I'm dropping one row for the name "Hjuk" from my sample script.
When the gender function fails, I would like to identify that gender as "Unknown" or NA. My full data set is fairly large, running about 11000 rows. Thanks for any suggestions.
Below is an example:
require(gender)
df0 <- data.frame(c("Sara","Tiffany","Tyler","Rajdeep","Josee","Hjuk"), stringsAsFactors = FALSE)
colnames(df0) <- "v1"
df1 <- apply(df0, 1, function(x) gender(x))
df2 <- do.call(rbind, lapply(df1, data.frame, stringsAsFactors=FALSE))
df2
name proportion_male proportion_female gender year_min year_max
1 Sara 0.0029 0.9971 female 1932 2012
2 Tiffany 0.0034 0.9966 female 1932 2012
3 Tyler 0.9714 0.0286 male 1932 2012
4 Rajdeep 0.7786 0.2214 male 1932 2012
5 Josee 0.0000 1.0000 female 1932 2012
Upvotes: 1
Views: 439
Reputation: 18425
You can do this with
df1 <- merge(df0,gender(df0$v1),by.x="v1",by.y="name",all.x=TRUE)
Upvotes: 2