Reputation: 122
I have extracted a list of text from a section of a website. Specifically, I scraped the 'experience' section of Linkedin and have extracted each work experience item within that section.
However, the data is in the form of a text list, and I am having issues formatting it as a csv file in the way that I want.
My relevant code is below:
from selenium import webdriver
ChromeOptions = webdriver.ChromeOptions()
driver = webdriver.Chrome('/Users/jones/Downloads/chromedriver')
driver.get('https://www.linkedin.com/in/pauljgarner/')
rows = []
name = sel.xpath('normalize-space(//li[@class="inline t-24 t-black t-normal break-words"])').extract_first()
experience = driver.find_elements_by_xpath('//section[@id = "experience-section"]/ul//li')
rows.append([name])
for item in experience:
rows[0].append(item.text)
print(item.text)
print("")
with open(parameters.file_name, 'w', encoding='utf8') as file:
writer = csv.writer(file)
writer.writerows(rows)
The excel output I am getting from this code is below:
As you can see, it seems like a line break is separating each observation.
My desired excel output is below:
(Note that each text list has it's own variable names. For example, Company Name is for the first text list, and Company Name_2 for the second text list).
I suspect that I need to find a way to specify in Python that a line break is a delimiter in each list of text. However, I am unsure of how to do this. Any help would be appreciated.
Disclosure: I posted a question on this same issue a few days ago, but I am posting a more specific question on delimiters because I haven't seen anything about specifying linebreaks as a delimiter in writing to csv with Python.
Upvotes: 0
Views: 303
Reputation: 3010
I think you need to split each element of rows on '\n'. You also need to specify the headers to get the desired output.
headers = ['Name', 'Title', ... ]
with open(parameters.file_name, 'w', encoding='utf8') as file:
writer = csv.writer(file)
writer.writerow(headers)
for row in rows:
writer.writerow(row.split('\n'))
Upvotes: 1