Beautifulsoup extract
and
tags and write results to CSV

Question

I'm trying to extract the text inside all lines ('li') from the following:


        
            
                 
                Year 1
                Year 2
                Year 3
                Year 4
                Year 5
                5 Yr Total
            
        
        
        
            
                Depreciation
                $5,390
                $1,658
                $1,459
                $1,293
                $1,161
                $10,961
            
        
        
        
            
                Taxes & Fees
                $1,424
                $61
                $61
                $61
                $61
                $1,668
            
        
        
        
            
                Financing
                $1,022
                $817
                $603
                $375
                $135
                $2,952

To get to this point I used the following:

import requests
from bs4
import BeautifulSoup
import csv
page = requests.get('https://www.edmunds.com/ford/escape/2017/cost-to-own/')
soup = BeautifulSoup(page.content, 'html.parser')
data = soup.find_all("ul", {"id": "tco_detail_data"})

Now, to extract the all the lines under class="first", I used:

details = soup.find_all("li", {"class":"first"})

However, it with only get the firs parent li tag and the child li tags under it. How can I repeat the process to select each li class"first" section and write results to CSV? I would appreciate any guidance.

cmaher · Accepted Answer

Here's a similar approach to the previous answer, which will give you the table from the web page in nested list form (i.e. [[table row], [table row], ...':

data = soup.find_all("ul", {"id": "tco_detail_data"})

# get all list elements
lis = data[0].find_all('li')

# add a helper lambda, just for readability
find_ul = lambda x: x.find_all('ul')
uls = [find_ul(elem) for elem in lis if find_ul(elem) != []]

# use a nested list comprehension to iterate over the  tags
# and extract text from each  into sublists
text = [[li.text.encode('utf-8') for li in ul[0].find_all('li')] for ul in uls]

# [
#   ['\xc2\xa0', 'Year 1', 'Year 2', 'Year 3', 'Year 4', 'Year 5', '5 Yr Total'],
#   ['Depreciation', '$4,853', '$1,658', '$1,459', '$1,293', '$1,161', '$10,424'],
#   ['Taxes & Fees', '$2,057', '$21', '$66', '$21', '$66', '$2,231'],
#   ['Financing', '$1,026', '$821', '$605', '$376', '$136', '$2,964'],
#   ['Fuel', '$1,606', '$1,654', '$1,704', '$1,755', '$1,808', '$8,527'],
#   ['Insurance', '$764', '$791', '$818', '$847', '$877', '$4,097'],
#   ['Maintenance', '$230', '$601', '$385', '$1,653', '$1,504', '$4,373'],
#   ['Repairs', '$0', '$0', '$109', '$257', '$374', '$740'],
#   ['Tax Credit', '$0', '', '', '', '', '$0'],
#   ['True Cost to Own \xc2\xae', '$10,536', '$5,546', '$5,146', '$6,202', '$5,926', '$33,356']
# ]

# write "text" list to csv
with open('ford_escape_2017.csv', 'w') as f:
    writer = csv.writer(f)
    writer.writerows(text)

Beautifulsoup extract <li> and <ul> tags and write results to CSV

Answers (2)

Related Questions

Beautifulsoup extract &lt;li&gt; and &lt;ul&gt; tags and write results to CSV

Answers (2)

Related Questions

Beautifulsoup extract <li> and <ul> tags and write results to CSV