Reputation: 33
Stuck on a data tidying problem, and not sure how to work around it. I have messy character data on whisky, which I'm looking to organise so that I can conduct some analyses. Specifically, I'm looking to extract the abv % values from the product name, and place them into a new column. An example of the current data would look like this:
Name | Price |
---|---|
Aberfeldy, 12 year old, 40% | 40 |
Ardbeg, 1974 Vintage, Cask #3524, 49.9% | 20000 |
Whilst what I would like the data to look like this:
Name | Price | ABV |
---|---|---|
Aberfeldy, 12 year old, 40% | 40 | 40 |
Ardbeg, 1974 Vintage, Cask #3524, 49.9% | 20000 | 49 |
I have so little experience with messy data of this sort, so I have no idea what needs to be done. I have tried experimenting with gsub
, and I'm getting frustratingly close! (I can extract the 40% value from the Aberfeldy, but can only extract the 9 from the 49.9% of the Ardbeg).
#demo
name <- c("Aberfeldy, 12 year old, 40%", "Ardbeg, 1974 Vintage, Cask #3524, 49.9%")
price <- c(40, 20000)
example_data <- data.frame(name, price)
attempt <- as.numeric(gsub(".*?([0-9]+).*%", "\\1", example_data$name))
# Attemp results
40 9
What can I do to fix this?
Upvotes: 0
Views: 19