Extracting conditional numeric values from character data in R

Question

Stuck on a data tidying problem, and not sure how to work around it. I have messy character data on whisky, which I'm looking to organise so that I can conduct some analyses. Specifically, I'm looking to extract the abv % values from the product name, and place them into a new column. An example of the current data would look like this:

Name	Price
Aberfeldy, 12 year old, 40%	40
Ardbeg, 1974 Vintage, Cask #3524, 49.9%	20000

Whilst what I would like the data to look like this:

Name	Price	ABV
Aberfeldy, 12 year old, 40%	40	40
Ardbeg, 1974 Vintage, Cask #3524, 49.9%	20000	49

I have so little experience with messy data of this sort, so I have no idea what needs to be done. I have tried experimenting with gsub, and I'm getting frustratingly close! (I can extract the 40% value from the Aberfeldy, but can only extract the 9 from the 49.9% of the Ardbeg).

#demo
  name <- c("Aberfeldy, 12 year old, 40%", "Ardbeg, 1974 Vintage, Cask #3524, 49.9%")
  price <- c(40, 20000)
example_data <- data.frame(name, price)

attempt <- as.numeric(gsub(".*?([0-9]+).*%", "\1", example_data$name))

# Attemp results
40  9

What can I do to fix this?

Extracting conditional numeric values from character data in R

Answers (0)

Related Questions