Reputation: 197
My code is working. But I need the info in columns. Anyone who can help me with this? I thank you in advance.
from bs4 import BeautifulSoup
import csv
#Request webpage content
result = requests.get('https://www.solar.com/learn/solar-panel-cost/')
#Save content in var
src = result.content
#soupactivate
soup = BeautifulSoup(src,'lxml')
#Open CSV
file = open('priceperwatt','w')
writer = csv.writer(file)
for tr in soup.findAll('tr'):
rowtext = tr.get_text()
writer.writerow([rowtext])
file.close()
Upvotes: 0
Views: 32
Reputation: 2161
So I made some improvements to your code. The primary issue is that the data you are scraping doesn't fit into an array because the first few rows do not contain the same number of elements. But once you get to ['State', 'Market Price Per Watt', 'Solar.com Price Per Watt'] you can use those as the column headers. My changes include modifying your csv reader and writer to accept a newline kwarg which separates each row.
from bs4 import BeautifulSoup
import requests
import csv
#Request webpage content
result = requests.get('https://www.solar.com/learn/solar-panel-cost/')
#Save content in var
src = result.content
#soupactivate
soup = BeautifulSoup(src,'lxml')
#Open CSV
with open('priceperwatt','w', newline='') as file:
writer = csv.writer(file)
for tr in soup.findAll('tr'):
rowtext = tr.get_text()
writer.writerow([rowtext])
with open('priceperwatt','r', newline='') as file:
reader = csv.reader(file)
for row in reader:
row = ''.join(row).strip('\n').split('\n')
print(row)
output:
['Solar Price Per Watt', 'Solar Price Per Kilowatt Hour']
['GROSS system cost / Total system wattage', 'NET system cost / Total lifetime system production']
['Useful for comparing solar quotes against one another', 'Useful for comparing solar versus utility bill']
['Pertains to the POWER of a system', 'Pertains to the PRODUCTION of a system']
['Typically $3.00-4.00/watt', 'Typically $0.06-0.08/kWh']
['State', 'Market Price Per Watt', 'Solar.com Price Per Watt']
['Arizona', '$3.61/W', '$3.39/W']
['California', '$4.31/W', '$3.76/W']
['Connecticut', '$3.65/W', '$3.68/W']
['Florida', '$3.45/W', '$2.82/W']
['Massachusetts', '$4.18/W', '$3.92/W']
['Maryland', '$3.93/W', '$3.64/W']
['Minnesota', '$4.61/W', '$3.66/W']
['New Hampshire', '$3.72/W', '$3.37/W']
['New Mexico', '$4.82/W', '$3.56/W']
['Oregon', '$3.79/W', '$3.68/W']
['Texas', '$3.83/W', '$3.17/W']
['Wisconsin', '$3.29/W', '$3.83/W']
Finally:
import pandas as pd
lst = []
with open('priceperwatt','r', newline='') as file:
reader = csv.reader(file)
for row in reader:
row = ''.join(row).strip('\n').split('\n')
lst.append(row)
pd.DataFrame(lst[6:], columns=lst[5])
Upvotes: 1