Reputation: 873
I'm trying to parse a specifc part of url after search using any language.(Ideally Javascript but open to Python)
How do I get a specific part of url and save/store?
For example, In songking.com, The way to get artist_id is checking a specific part of the url after searching artist name in the search bar of the website.
in the case below, the artist id is 301329.
https://www.songkick.com/artists/301329-rac
I strongly believe there is a way to parse this part using either python or js given that I have a csv file that has artist name in its column. Instead of searching all the artists one by one. I wonder about the algorithm that literate my csv column and search it and parse the url and save/store.
It would be very grateful even if I could only get a hint that I could start with.
Thank you so much always.
Upvotes: 0
Views: 127
Reputation: 1636
First, you can use RegEx simply.
In python
import re
url = 'https://www.songkick.com/artists/301329-rac'
pattern = '/artists/(\d+)-\w'
match = re.search(pattern, url)
if match:
artist_id = match.group(1)
I hope this will help you.
Upvotes: 0
Reputation: 166
It can be done using regular expressions.
Here's an example of a JavaScript implementation
const url = "https://www.songkick.com/artists/301329-rac";
const regex = /https:\/\/www\.songkick\.com\/artists\/(\d+)-.+/;
const match = url.match(regex);
if (match) {
console.log('Artist ID: ' + match[1]);
} else {
console.log('No Artist ID found!');
}
This regular expression /https:\/\/www\.songkick\.com\/artists\/(\d+)-.+/
means that we're trying to match something that starts with https://www.songkick.com/artists/, preceded by a group of decimals a dash then a group of letters.
The match() method retrieves the result of matching a string against a regular expression.
Thus it will return the overall string in the first index, then the matched (\d+)
group in the second index (match[1]
in our case).
If you're not sure of the protocol (http vs https) you can add a ?
in the regex right after https. That makes the s in https optional. So the regex would become /https?:\/\/www\.songkick\.com\/artists\/(\d+)-.+/
.
Let me know if you need more explanation.
Upvotes: 1