Reputation: 172
I would like to convert a height variable I have from character type to numeric. for context, this is so I can use the values to calculate body mass index.
Looking at the below example data frame, I would like to convert Height_1 into Height_2 (whereby Height_2 is in inches):
# Height_1 Height_2
# 5ft6in 66
# XftXin XXXX
# XftXin XXXX
# XftXin XXXX
# XftXin XXXX
I have tried a few things using the "tidyverse" and "measurements" packages but have not been able to create a variable like Height_2 above. For example:
library(dplyr)
library(tidyr)
df %>%
separate(Height_1,c('feet', 'inches'), sep = 'ft', convert = TRUE, remove = FALSE) %>%
mutate(Height_2 = 12*feet + inches)
I think this is because the above doesn't address the fact that there is "in" at the end of the values.
Upvotes: 1
Views: 621
Reputation: 389175
You can use regex to extract feet and inches data from Height_1
and then perform the calculation.
library(dplyr)
library(tidyr)
df %>%
extract(Height_1, c('feet', 'inches'), '(\\d+)ft(\\d+)in', convert = TRUE, remove = FALSE) %>%
transmute(Height_1,
Height_2 = 12*feet + inches)
# Height_1 Height_2
#1 5ft6in 66
#2 4ft9in 57
#3 5ft12in 72
#4 4ft9in 57
#5 6ft2in 74
In base R -
transform(strcapture('(\\d+)ft(\\d+)in', df$Height_1,
proto = list(feet = numeric(), inches = numeric())),
Height_2 = 12*feet + inches)
data
df <- structure(list(Height_1 = c("5ft6in", "4ft9in", "5ft12in", "4ft9in", "6ft2in")), row.names = c(NA, -5L), class = "data.frame")
Upvotes: 3