Hellrockerz
Hellrockerz

Reputation: 9

I need to scrape the instagram link that is highlighted in the image

This is the link I want to scrapeDescription of YT Page that I am trying to scrapeI am trying regex in python. I am facing a problem how to clip out the portion that is< "www.instagram.com%2FMohakMeet" I need to know the characters which I need to use in regex.

#python3

for d in g:
  stripped = (d.rstrip())
  url = stripped+"/about"
  print("Retreiving" + url)
  response = requests.get(url)
  data = response.text
  link = re.findall('''(www.instagram.com.+?)\s?\"?''', data)

  if link == []:
      print ('No Link')
  else: x = str(link[0])
  print ("Insta Link", x)
  y = x.replace("%2F", '/', 3)
  print (y)
  # with open ('l.txt', 'a') as v:
  #     v.write(y)
      # v.write("\n")

This is My Code but the main problem is, while scraping Python is scraping the Description of the Youtube page shown in the 2nd picture. Please Help. This is the pattern which is not working.

    (www.instagram.com.+?)\s?\"?'''

Upvotes: 1

Views: 75

Answers (1)

jsofri
jsofri

Reputation: 245

  1. this link will let you debug your regex https://regex101.com/
  2. a common pitfall when creating regex, is using standard string ('my string') and not raw strings r'my string' .
    see also https://docs.python.org/3/library/re.html

Upvotes: 1

Related Questions