Bridgbro
Bridgbro

Reputation: 269

R Apply and Keeping All Rows with Gender Package

I'm looking for a good way to apply the gender function to a list of names (I'm pulling from XML), but I want to keep ALL the rows in order to join to additional data. Any suggestions on a good way to approach this?

Currently, I'm dropping one row for the name "Hjuk" from my sample script.

When the gender function fails, I would like to identify that gender as "Unknown" or NA. My full data set is fairly large, running about 11000 rows. Thanks for any suggestions.

Below is an example:

require(gender)

df0 <- data.frame(c("Sara","Tiffany","Tyler","Rajdeep","Josee","Hjuk"), stringsAsFactors = FALSE)
colnames(df0) <- "v1"
df1 <- apply(df0, 1, function(x) gender(x))
df2 <- do.call(rbind, lapply(df1, data.frame, stringsAsFactors=FALSE))
df2

name proportion_male proportion_female gender year_min year_max
1    Sara          0.0029            0.9971 female     1932     2012
2 Tiffany          0.0034            0.9966 female     1932     2012
3   Tyler          0.9714            0.0286   male     1932     2012
4 Rajdeep          0.7786            0.2214   male     1932     2012
5   Josee          0.0000            1.0000 female     1932     2012

Upvotes: 1

Views: 439

Answers (1)

Andrew Gustar
Andrew Gustar

Reputation: 18425

You can do this with

df1 <- merge(df0,gender(df0$v1),by.x="v1",by.y="name",all.x=TRUE)

Upvotes: 2

Related Questions