Rhys Maredudd Davies
Rhys Maredudd Davies

Reputation: 33

Extracting conditional numeric values from character data in R

Stuck on a data tidying problem, and not sure how to work around it. I have messy character data on whisky, which I'm looking to organise so that I can conduct some analyses. Specifically, I'm looking to extract the abv % values from the product name, and place them into a new column. An example of the current data would look like this:

Name Price
Aberfeldy, 12 year old, 40% 40
Ardbeg, 1974 Vintage, Cask #3524, 49.9% 20000

Whilst what I would like the data to look like this:

Name Price ABV
Aberfeldy, 12 year old, 40% 40 40
Ardbeg, 1974 Vintage, Cask #3524, 49.9% 20000 49

I have so little experience with messy data of this sort, so I have no idea what needs to be done. I have tried experimenting with gsub, and I'm getting frustratingly close! (I can extract the 40% value from the Aberfeldy, but can only extract the 9 from the 49.9% of the Ardbeg).

#demo
  name <- c("Aberfeldy, 12 year old, 40%", "Ardbeg, 1974 Vintage, Cask #3524, 49.9%")
  price <- c(40, 20000)
example_data <- data.frame(name, price)

attempt <- as.numeric(gsub(".*?([0-9]+).*%", "\\1", example_data$name))

# Attemp results
40  9

What can I do to fix this?

Upvotes: 0

Views: 19

Answers (0)

Related Questions