Reputation: 1172
I am trying to parse this rss: https://www.mathjobs.org/jobs?joblist-0-----rss
I try to use BeautifulSoup but I cannot make sense of what is going on. I get the answer
82
0
when I use the following script.
import requests
from bs4 import BeautifulSoup
session = requests.session()
response = session.get('https://www.mathjobs.org/jobs?joblist-0-----rss')
doc = BeautifulSoup(response.content,'html.parser')
titles = doc.find_all('title')
print( len(titles) )
divs = doc.find_all('div')
As far as I understand the data is given in html format and there is only one title tag and several divs. What is going on here? I got similar results using pyquery.
Upvotes: 0
Views: 34
Reputation: 5110
You forgot to make soup
before using BeautifulSoup
.
Add this line - doc = BeautifulSoup(response.text,'lxml')
Here is the full code.
import requests
from bs4 import BeautifulSoup
session = requests.session()
response = session.get('https://www.mathjobs.org/jobs?joblist-0-----rss')
doc = BeautifulSoup(response.text,'lxml')
titles = doc.find_all('title')
print(titles)
Upvotes: 2