Reputation: 438
I am using Selenium to scrape People also Ask question and answers on Google and want to export the outputs (questions, answers and URL) to a csv
file but I want each of them in different lines.
Everything goes well I can even print out everything on different lines, but when checking my output csv
the question and answer are all in one row.
My code looks like this:
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.firefox.options import Options
from tqdm import tqdm
from time import sleep
import itertools
import threading
import time
import sys
import csv
import pandas as pd
query = "nflx"
clicks = 4
def search(query,clicks):
with webdriver.Firefox() as driver:
driver.get("https://www.google.com?hl=en")
driver.find_element_by_xpath("//input[@aria-label='Search']").send_keys(query)
driver.find_elements_by_xpath("/html/body/div/div[3]/form/div[2]/div[1]/div[3]/center/input[1]")
searchbtn = driver.find_elements_by_xpath("//input[@aria-label='Google Search']")
searchbtn[-1].click()
#Questions with answers. Have to clean a little bit.
paa = driver.find_elements_by_css_selector('div.related-question-pair')
for i in range(clicks):
paa[i].click()
paa = driver.find_elements_by_css_selector('div.related-question-pair')
list_paa = []
for j in paa:
p = format(j.text)
print(p)
list_paa.append(p)
To export I tried this:
with open('file1.csv', 'w',newline='\n', encoding='utf-8') as file:
writer = csv.writer(file)
for row in list_paa:
writer.writerow(zip(row))
And this:
#Tried this
df = pd.DataFrame(list_paa, columns=["column"])
df.to_csv('list.csv', index=False)
Current CSV output when executing search(query,clicks)
:
Desired CSV output for all questions:
Upvotes: 0
Views: 328
Reputation: 66
I guess running a for loop to process the data and split it with splitlines() would be the easiest way to go about?
As an example:
list_paa = []
for j in paa:
p = format(j.text)
p = p.splitlines()
print(p)
list_paa.append(p)
Im sure there is more to add to this example for it actually to work as intended by you get the idea :).
Upvotes: 1