How to save results of web scraping Python

Question

I am trying to scrape LexisNexis. I would like to retrieve the headlines, source, and date of the news story. Here is the code I wrote to use after using selenium to do the search for me. I am having trouble saving the data into a csv file. I keep getting encoding errors. When I am not getting encoding errors I get the data with MANY spaces and weird characters like \ and .

Here is an example of what I retrieve:

[" Networks Continue Hammering Indiana for Sparking a 'Firestorm' Over Religious Freedom Law ", " All Three Networks Pile on Indiana's 'Controversial' Law ", " ABC Continues Obsessively Bashing 'Controversial' 'Religious Freedom' Bill ", " ABC, NBC Rush to Paint Trump as a 'Moderate,' 'Trump 2.0' ", ' CBS Hits the Panic Button, Rails Against Religious Freedom Bills in Georgia, North Carolina ', ' Jihad Report - October 7, 2016 ', ' Education News Roundup: May 2, 2016 ', ' NBC CBS Keep Up Attack on Religious Freedom Laws ', ' NBC Slams Indiana Religious Freedom Law...Then Starts Week-Long Series on Faith ', " Networks Again Bash Indiana for Causing 'National Outcry' and 'Uproar' Over Religious Freedom Law "]

This is the case for headlines, dates, and sources. I am not sure what I am doing wrong here.

scd =browser.page_source
soup = BeautifulSoup(scd, "lxml")


headlines=[]
for headline in soup.findAll('a', attrs={"data-action":"title"}):
 head_line=headline.get_text()
 #head_line.strip('a>, ')
 #head_line.encode('utf-8')
 Headlines = head_line.encode()
 headlines.append(head_line)

sources=[]        
 for sources in soup.findAll('a', attrs{"class":"rightpanefiltercontent notranslate", "href":"#"}):
source_only=sources.get_text()
source_only.encode('utf-8')
sources.append(source_only)
Sources = sources.encode()

dates=[]          
for dates in soup.findAll('a', attrs={"class":"rightpanefiltercontent"}):
date_only=dates.get_text()
date_only.strip('')
date_only.encode()
dates.append(date_only)
Dates = dates.encode()

news=[Headlines,Sources,Dates]


result = "/Users/danashaat/Desktop/Tornadoes/IV Search News Results/data.csv"
with open(result, 'w') as result:
newswriter = csv.writer(result, dialect='excel') 
newswriter.writerow(News)

Also, here is the result for when I find headlines:

[Networks Continue Hammering Indiana for Sparking a 'Firestorm' Over Religious Freedom Law , All Three Networks Pile on Indiana's 'Controversial' Law , ABC Continues Obsessively Bashing 'Controversial' 'Religious Freedom' Bill , ABC, NBC Rush to Paint Trump as a 'Moderate,' 'Trump 2.0' , CBS Hits the Panic Button, Rails Against Religious Freedom Bills in Georgia, North Carolina , Jihad Report - October 7, 2016 , Education News Roundup: May 2, 2016 , NBC CBS Keep Up Attack on Religious Freedom Laws , NBC Slams Indiana Religious Freedom Law...Then Starts Week-Long Series on Faith , Networks Again Bash Indiana for Causing 'National Outcry' and 'Uproar' Over Religious Freedom Law ]

I've been trying to figure this out for HOURS, so any help will be much appreciated.

How to save results of web scraping Python

Answers (1)

Related Questions