Reputation: 9
I have a dataset that looks like this:
Observation | Age | Family members | Income |
---|---|---|---|
1 | 25 | 2 | 3k |
2 | 29 | 4 | 1k |
3 | 32 | Six | 3k |
3 | 32 | Five | 5k |
I'm a STATA user, so I've been stuck in this problem for a while. How can i convert the variable Family members into a numeric variable, since it has both numeric and character values.
Upvotes: 0
Views: 167
Reputation: 886948
We can use as.numeric
and there will be a warning that shows non-numeric elements are converted to NA
library(english)
library(dplyr)
df1$Family_members <- with(df1, as.numeric(coalesce(as.character(match(toupper(Family_members),
toupper(as.english(1:9)))), Family_members)))
-output
df1
Observation Age Family_members Income
1 1 25 2 3k
2 2 29 4 1k
3 3 32 6 3k
4 3 32 5 5k
df1 <- structure(list(Observation = c(1L, 2L, 3L, 3L), Age = c(25L,
29L, 32L, 32L), Family_members = c("2", "4", "Six", "Five"),
Income = c("3k", "1k", "3k", "5k")), class = "data.frame", row.names = c(NA,
-4L))
Upvotes: 1
Reputation: 16978
Assuming a family doesn't have more then 19 members, you could use a custom function like
word2num <- function(word) {
word <- tolower(word)
output <- match(word,
c("one", "two", "three", "four", "five", "six", "seven",
"eight", "nine", "ten", "eleven", "twelve", "thirteen",
"fourteen", "fifteen", "sixteen", "seventeen", "eighteen", "nineteen")
)
if (is.na(output)) {
return(as.numeric(word))
}
output
}
and apply it to your data:
df$Family_members <- sapply(df$Family_members, word2num)
This returns
Observation Age Family_members Income
1 1 25 2 3k
2 2 29 4 1k
3 3 32 6 3k
4 3 32 5 5k
Upvotes: 1