parsing with re.findall regex between html tags leads to eof error

Question

I would like to parse all link extensions in the below text using re.findall to store my result in an array.

my_text =   
  
 
  Comparer
 
  
  
 
  Comparer
 
  
  
 
  Comparer

I'm trying to get this result :

["link_extension_1.php","link_extension_2.php","link_extension_3.php"]

I tried that :

re.findall(r'\



but got that error :


  SyntaxError: unexpected EOF while parsing
  Thanks Max

Sunitha · Accepted Answer

Your regex works fine for me

>>> re.findall(r'\



But avoid parsing html data using regex and use some tool designed for parsing html data, something like BeatifulSoup

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(my_text, "html.parser")
>>> [div.find('a').get('href') for div in soup.find_all('div', {'class': "2ndclass__img"})]
['link_extension_1.php', 'link_extension_2.php', 'link_extension_3.php']

parsing with re.findall regex between html tags leads to eof error

Answers (1)

Related Questions