Reputation: 841
My dataframe contains a URL field which sometimes contains a 13-digit product identifier. I need to extract this product ID and write it to a new column call ISBN. Below are 3 different URL, each with the product ID positioned differently:
>https://catalog.macmillan.com/childrens/book/brazen/rebel-ladies-who-rocked-the-world/pnlope-bagieu/**9781626728691**?utm_source=exacttarget&utm_medium=newsletter&utm_term=na-schoolandlibrary&utm_content=na-discover-nl&utm_campaign=schoolandlibrary
>https://us.macmillan.com/excerpt?isbn=**9781250151025**&utm_source=exacttarget&utm_medium=newsletter&utm_term=na-schoolandlibrary&utm_content=na-discover-nl&utm_campaign=schoolandlibrary
>https://catalog.macmillan.com/childrens/book/so-tall-within/sojourner-truths-long-walk-toward-freedom/gary-d-schmidt/daniel-minter/**9781626728721**?utm_source=exacttarget&utm_medium=newsletter&utm_term=na-schoolandlibrary&utm_content=na-discover-nl&utm_campaign=schoolandlibrary
Upvotes: 1
Views: 249
Reputation: 72758
Using gregexpr
, assuming that length of product number is always 13, as shown.
regmatches(tx, gregexpr("(\\d{13})", tx))
# [[1]]
# [1] "9781626728691" "9781250151025" "9781626728721"
Data
tx <- "https://catalog.macmillan.com/childrens/book/brazen/rebel-ladies-who-rocked-the-world/pnlope-bagieu/9781626728691?utm_source=exacttarget&utm_medium=newsletter&utm_term=na-schoolandlibrary&utm_content=na-discover-nl&utm_campaign=schoolandlibrary https://us.macmillan.com/excerpt?isbn=9781250151025&utm_source=exacttarget&utm_medium=newsletter&utm_term=na-schoolandlibrary&utm_content=na-discover-nl&utm_campaign=schoolandlibrary https://catalog.macmillan.com/childrens/book/so-tall-within/sojourner-truths-long-walk-toward-freedom/gary-d-schmidt/daniel-minter/9781626728721?utm_source=exacttarget&utm_medium=newsletter&utm_term=na-schoolandlibrary&utm_content=na-discover-nl&utm_campaign=schoolandlibrary"
Upvotes: 1