Cannot parse rss as html

Question

I am trying to parse this rss: https://www.mathjobs.org/jobs?joblist-0-----rss

I try to use BeautifulSoup but I cannot make sense of what is going on. I get the answer

82
0

when I use the following script.

import requests
from bs4 import BeautifulSoup

session = requests.session()

response = session.get('https://www.mathjobs.org/jobs?joblist-0-----rss')

doc = BeautifulSoup(response.content,'html.parser')

titles = doc.find_all('title')

print( len(titles) )

divs = doc.find_all('div')

As far as I understand the data is given in html format and there is only one title tag and several divs. What is going on here? I got similar results using pyquery.

MD. Khairul Basar · Accepted Answer

You forgot to make soup before using BeautifulSoup.
Add this line - doc = BeautifulSoup(response.text,'lxml')

Here is the full code.

import requests
from bs4 import BeautifulSoup

session = requests.session()
response = session.get('https://www.mathjobs.org/jobs?joblist-0-----rss')
doc = BeautifulSoup(response.text,'lxml')
titles = doc.find_all('title')

print(titles)

Cannot parse rss as html

Answers (1)

Related Questions