Reputation: 402
I am requesting a wikipedia page that returns all the text from that website like so:
def my_function(addr):
response = requests.get(addr)
print(response.text)
my_function("https://en.wikipedia.org/wiki/Web_scraping")
Right now what im trying to do is basically delete unwanted parts, basically all text before the id with the class 'See_also'. Is there a right and easy way to do so? I could not just delete a certain amount of lines since this code is meant to work for different wiki sites.
Upvotes: 2
Views: 177
Reputation: 2625
You can use REGEX
(huraay).
import requests
import re
def my_function(addr):
response = requests.get(addr)
print(re.findall("See_also[\\s\\S]*", response.text))
my_function("https://en.wikipedia.org/wiki/Web_scraping")
Upvotes: 2