Reputation: 845
The code below is yielding a value for the "resultStats" ID, which I would like to save in a CSV file. Is there any smart way to have the " desired_google_queries" (i.e. the search terms) in column A and the "resultStats" values in column B of the CSV?
I saw that there are a number of threads on this topic but none of the solutions I have read through worked for the specific situation.
from bs4 import BeautifulSoup
import urllib.request
import csv
desired_google_queries = ['Elon Musk' , 'Tesla', 'Microsoft']
for query in desired_google_queries:
url = 'http://google.com/search?q=' + query
req = urllib.request.Request(url, headers={'User-Agent' : "Magic Browser"})
response = urllib.request.urlopen( req )
html = response.read()
soup = BeautifulSoup(html, 'html.parser')
resultStats = soup.find(id="resultStats").string
print(resultStats)
Upvotes: 1
Views: 809
Reputation: 1035
instead of writing it line by line, you can write it all in one go by storing the result in a pandas dataframe first. See below code
from bs4 import BeautifulSoup
import urllib.request
import pandas as pd
data_dict = {'desired_google_queries': [],
'resultStats': []}
desired_google_queries = ['Elon Musk' , 'Tesla', 'Microsoft']
for query in desired_google_queries:
url = 'http://google.com/search?q=' + query
req = urllib.request.Request(url, headers={'User-Agent' : "Magic Browser"})
response = urllib.request.urlopen( req )
html = response.read()
soup = BeautifulSoup(html, 'html.parser')
resultStats = soup.find(id="resultStats").string
data_dict['desired_google_queries'].append(query)
data_dict['resultStats'].append(resultStats)
df = pd.DataFrame(data=data_dict)
df.to_csv(path_or_buf='path/where/you/want/to/save/thisfile.csv', index=None)
Upvotes: 1
Reputation: 4865
I took the liberty of rewriting this to use the Requests library instead of urllib, but this shows how to do the CSV writing which is what I think you were more interested in:
from bs4 import BeautifulSoup
import requests
import csv
desired_google_queries = ['Elon Musk' , 'Tesla', 'Microsoft']
result_stats = dict()
for query in desired_google_queries:
url = 'http://google.com/search?q=' + query
response = requests.get(url)
html = response.text
soup = BeautifulSoup(html, 'html.parser')
result_stats[query] = soup.find(id="resultStats").string
with open ('searchstats.csv', 'w', newline='') as fout:
cw = csv.writer(fout)
for q in desired_google_queries:
cw.writerow([q, result_stats[q]])
Upvotes: 1
Reputation: 845
The original answer has been deleted unfortunately - please find below the code for everyone else interested in the situation. Thanks to the user who has posted the solution in the first place:
with open('eggs.csv', 'w', newline='') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=' ',
quotechar='|', quoting=csv.QUOTE_MINIMAL)
spamwriter.writerow(['query', 'resultStats'])
for query in desired_google_queries:
...
spamwriter.writerow([query, resultStats])
Upvotes: 0