Robert Grootjen
Robert Grootjen

Reputation: 197

How can I put this info in columns?

My code is working. But I need the info in columns. Anyone who can help me with this? I thank you in advance.

from bs4 import BeautifulSoup
import csv


#Request webpage content
result = requests.get('https://www.solar.com/learn/solar-panel-cost/')

#Save content in var
src = result.content

#soupactivate
soup = BeautifulSoup(src,'lxml')


#Open CSV
file = open('priceperwatt','w')
writer = csv.writer(file)

for tr in soup.findAll('tr'):
    rowtext = tr.get_text()
    writer.writerow([rowtext])

file.close()

Upvotes: 0

Views: 32

Answers (1)

Bugbeeb
Bugbeeb

Reputation: 2161

So I made some improvements to your code. The primary issue is that the data you are scraping doesn't fit into an array because the first few rows do not contain the same number of elements. But once you get to ['State', 'Market Price Per Watt', 'Solar.com Price Per Watt'] you can use those as the column headers. My changes include modifying your csv reader and writer to accept a newline kwarg which separates each row.

from bs4 import BeautifulSoup
import requests
import csv


#Request webpage content
result = requests.get('https://www.solar.com/learn/solar-panel-cost/')

#Save content in var
src = result.content

#soupactivate
soup = BeautifulSoup(src,'lxml')


#Open CSV
with open('priceperwatt','w', newline='') as file:
    writer = csv.writer(file)

    for tr in soup.findAll('tr'):
        rowtext = tr.get_text()
        writer.writerow([rowtext])

with open('priceperwatt','r', newline='') as file:
    reader = csv.reader(file)
    for row in reader:
        row = ''.join(row).strip('\n').split('\n')
        print(row)

output:

['Solar Price Per Watt', 'Solar Price Per Kilowatt Hour']
['GROSS system cost / Total system wattage', 'NET system cost / Total lifetime system production']
['Useful for comparing solar quotes against one another', 'Useful for comparing solar versus utility bill']
['Pertains to the POWER of a system', 'Pertains to the PRODUCTION of a system']
['Typically $3.00-4.00/watt', 'Typically $0.06-0.08/kWh']
['State', 'Market Price Per Watt', 'Solar.com Price Per Watt']
['Arizona', '$3.61/W', '$3.39/W']
['California', '$4.31/W', '$3.76/W']
['Connecticut', '$3.65/W', '$3.68/W']
['Florida', '$3.45/W', '$2.82/W']
['Massachusetts', '$4.18/W', '$3.92/W']
['Maryland', '$3.93/W', '$3.64/W']
['Minnesota', '$4.61/W', '$3.66/W']
['New Hampshire', '$3.72/W', '$3.37/W']
['New Mexico', '$4.82/W', '$3.56/W']
['Oregon', '$3.79/W', '$3.68/W']
['Texas', '$3.83/W', '$3.17/W']
['Wisconsin', '$3.29/W', '$3.83/W']

Finally:

import pandas as pd

lst = []
with open('priceperwatt','r', newline='') as file:
    reader = csv.reader(file)
    for row in reader:
        row = ''.join(row).strip('\n').split('\n')
        lst.append(row)

pd.DataFrame(lst[6:], columns=lst[5])

Upvotes: 1

Related Questions