Reputation: 3
I am scraping text rendered in an HTML page. I using list comprehension to handle the text data coming from the HTML page.
I am grabbing two different objects (data,data2) from the web page, I want to write both of those objects into their own list.
data= driver.find_elements_by_xpath('//*[@id="root"]/div/div[2]/div[1]/div/div/div[2]/div/div/div[1]/div/div[5]/div/div[5]')
data2=driver.find_elements_by_xpath('//*[@id="root"]/div/div[2]/div[1]/div/div/div[2]/div/div/div[1]/div/div[5]/div/div[6]')
I am using selenium webdriver so when the objects return I need to iterate through the object and grab all the text (that is what is happening in the first round of list comprehension where I assign the lists to the variable text and text2).
text = [i.text for i in data]
text2 = [i. text for i in data2]
After the first list comprehension the list returns as the following
['Running\nRunning Normally\nShavings\n47.6%\n739\n739\n3:38:53\n1:31:51\n0:00:00']
I want to split the contents of that string into a list so that's why I followed up with a second list comprehension.
text=[i.split("\n")[:] for i in text]
text2=[i.split("\n")[:] for i in text2]
When I print the list it returns
[['Running Slow', 'Slow and/or Small Stops', 'Shavings', '48.7%', '800', '800', '3:56:43', '1:31:51', '0:00:00']]
Any suggestions on how to clean this up or make it work better?
Code:
data= driver.find_elements_by_xpath('//*[@id="root"]/div/div[2]/div[1]/div/div/div[2]/div/div/div[1]/div/div[5]/div/div[5]')
data2=driver.find_elements_by_xpath('//*[@id="root"]/div/div[2]/div[1]/div/div/div[2]/div/div/div[1]/div/div[5]/div/div[6]')
text = [i.text for i in data]
text2 = [i. text for i in data2]
text=[i.split("\n")[:] for i in text]
text2=[i.split("\n")[:] for i in text2]
print(text)
print(text2)
Upvotes: 0
Views: 49
Reputation: 1885
I think this code should work, but I cant try it since I don't have the data:
text=[j for i in data for j in i.text.split("\n")]
text2=[j for i in data2 for j in i.text.split("\n")]
Upvotes: 1