DTrain
DTrain

Reputation: 33

BeautifulSoup to CSV

I have setup BeautifulSoup to find a specific class for two webpages.

I would like to know how to write each URL's result to a unique cell in one CSV?

Also is there a limit to the number of URLs I can read as I would like to expand this to about 200 URLs once I get this working.

The class is always the same and I don't need any formatting just the raw HTML in one cell per URL.

Thanks for any ideas.

from bs4 import BeautifulSoup
import requests
urls = ['https://www.ozbargain.com.au/','https://www.ozbargain.com.au/forum']
for u in urls:
    response = requests.get(u)
    data = response.text
    soup = BeautifulSoup(data,'lxml')
    soup.find('div', class_="block")

Upvotes: 1

Views: 1041

Answers (1)

Hryhorii Pavlenko
Hryhorii Pavlenko

Reputation: 3910

Use pandas to work with tabular data: pd.DataFrame to create a table, and pd.to_csv to save table as csv (might also check out the documentation, append mode for example).

Basically it.

import requests
import pandas as pd
from bs4 import BeautifulSoup


def func(urls):
    for url in urls:
        data = requests.get(url).text
        soup = BeautifulSoup(data,'lxml')
        yield {
            "url": url, "raw_html": soup.find('div', class_="block")
        }


urls = ['https://www.ozbargain.com.au/','https://www.ozbargain.com.au/forum']

data = func(urls)
table = pd.DataFrame(data)
table.to_csv("output.csv", index=False)

Upvotes: 1

Related Questions