web scraping info and printing it in a csv file

Question

I need to parse info from a HTML file with Python (beautifulsoup or scrapy), then print it into a csv file. The relevant info is the file names and number of times seen in my account, here.

Relevant HTML concerning number of times:


      num 
      num

Relevant HTML for file names:


       {filename}

what i was able to do:

import requests  
page = requests.get("https://archive.org/details  /%40kareem76?&sort=-publicdate&page=2")  
page  
page.content  
nbr = BeautifulSoup(page.content, 'html.parser')  
nbr.find_all('div', class_='hidden-tiles views C C1')

sentence · Accepted Answer

This code should do the job:

import requests  
from bs4 import BeautifulSoup
import pandas as pd


html = requests.get("https://archive.org/details/@kareem76").text

soup = BeautifulSoup(html, 'html.parser')  
titles = [i.text.strip() for i in soup.find_all('div', class_='ttl')]
views = [i.find('nobr').text for i in soup.find_all('div', class_='hidden-tiles views C C1')]

df = pd.DataFrame({'titles':titles,
                  'views':views})


df.to_csv("titles-views.csv",
          mode='w',
          index = None,
          header=True)

and you get (just an excerpt):

web scraping info and printing it in a csv file

Answers (2)

Related Questions