Problems in geting article content while scraping news website using beautiful soup

Question

I am trying to scrape news article from rss feed along with the details like title, description, URL and date. I am not getting the entire article content in the description column as expected. Below is my code.

import requests
from bs4 import BeautifulSoup as bs

url='https://www.business-standard.com/rss/economy-policy-102.rss'
resp= requests.get(url)
soup = bs(resp.content,features='xml')
items= soup.findAll('item')
news_items = []

for item in items:
    news_item = {}
    news_item['title'] = item.title.text
    news_item['description'] = item.description.text
    news_item['link'] = item.link.text
    news_item['pubDate'] = item.pubDate.text
    news_items.append(news_item)

import pandas as pd
df = pd.DataFrame(news_items,columns=['title','description','link','pubDate'])
df['description'][0]

Output obtained - 'The re-import in the extended period would be without payment of basic customs duty and integrated goods and services tax'

As seen above I am not getting the full article content. What changes should be made?

Problems in geting article content while scraping news website using beautiful soup

Answers (1)

Related Questions