user12205480
user12205480

Reputation:

Display output as CSV in Python using Pandas

Below is my code

import pandas as pd
import requests
from bs4 import BeautifulSoup

source = requests.get('https://www.vanglaini.org/').text
soup = BeautifulSoup(source, 'lxml')
for article in soup.find_all('article'):
    headline = article.a.text
    summary=article.p.text
    link = "https://www.vanglaini.org" +article.a['href']
    #print(headline)
    #print(summary)
    #print(link)

#print()

news_csv = pd.DataFrame({'Headline': headline,
                         'Summary': summary,
                        'Link' : link,


                         })
print(news_csv)

i got this error headline = article.a.text AttributeError: 'NoneType' object has no attribute 'text'

Help!

Upvotes: 0

Views: 106

Answers (1)

furas
furas

Reputation: 142734

As you already get in my comments and in @AmiTavory (deleted) answer - not all articles have link and sometimes article.a gives None so you have None.text which gives you error.

You have to check if article.a is not None like

import requests
from bs4 import BeautifulSoup

source = requests.get('https://www.vanglaini.org/').text
soup = BeautifulSoup(source, 'lxml')

for article in soup.find_all('article'):
    if article.a is None:
        continue        

    headline = article.a.text
    summary = article.p.text
    link = "https://www.vanglaini.org" + article.a['href']
    print(headline)
    print(summary)
    print(link)

and it works.


EDIT: You can get error

raise ValueError("If using all scalar values, you must pass an index") ValueError: If using all scalar values, you must pass an index

for totally different reason and you should create new question on new page.

It is problem in DataFrame because you have only last value in headline, summary, link but DataFrame expects lists in

{
    'Headline': list_with_headlines,
    'Summary': list_with_summaries,
    'Link' : list_with_links,
}

You should create empty lists before for-loop

list_with_headlines = []
list_with_summaries = []
list_with_links = []

and inside for-loop you shouldappend() values to lists

list_with_headlines.append(headline)
list_with_summaries.append(summary)
list_with_links.append(link)

and later create DataFrame using lists

news_csv = pd.DataFrame({
    'Headline': list_with_headlines,
    'Summary': list_with_summaries,
    'Link' : list_with_links,
})

Full code:

import pandas as pd
import requests
from bs4 import BeautifulSoup

source = requests.get('https://www.vanglaini.org/').text
soup = BeautifulSoup(source, 'lxml')

list_with_headlines = []
list_with_summaries = []
list_with_links = []

for article in soup.find_all('article'):
    if article.a is None:
        continue        
    headline = article.a.text.strip()
    summary = article.p.text.strip()
    link = "https://www.vanglaini.org" + article.a['href']
    list_with_headlines.append(headline)
    list_with_summaries.append(summary)
    list_with_links.append(link)

news_csv = pd.DataFrame({
    'Headline': list_with_headlines,
    'Summary': list_with_summaries,
    'Link' : list_with_links,
})

print(news_csv)

Upvotes: 1

Related Questions