Reputation: 1
I'm scraping some data from a website but some of it is coming out with "\" in front of it. I tried to use this string of code but an error message occurred.
print([s.strip('\') for s in feet]) *EOL while scanning string literal
print([s.replace('\', ') for s in feet])
The code after the '\' in the first line became italicized, I have no clue what to do about this.
from lxml import html
import requests
list1 = []
height = []
user_website = "https://www.disabled-world.com/calculators-charts/height-weight.php"
page = requests.get(user_website)
tree = html.fromstring(page.content)
list2 = tree.xpath('//td/text()')
for x in list2:
list_holder = x.split(" ")
for i in list_holder:
list1.append(i.lower())
subs = "'"
feet = [i for i in list2 if subs in i]
subs2 = '"'
inches = [i for i in list2 if subs in i]
print([s.strip('\') for s in feet])
print([s.replace('\', ') for s in feet])
y = 0
for x in feet:
height.append(feet[y])
height.append(inches[y])
y+=1
print(height)
Upvotes: 0
Views: 112
Reputation: 2850
So I tried to extract your code, and this:
from lxml import html
import requests
list1 = []
height = []
user_website = "https://disabled-world.com/calculators-charts/height-weight.php"
page = requests.get(user_website)
tree = html.fromstring(page.content)
list2 = tree.xpath('//td/text()')
for x in list2:
list_holder = x.split(" ")
for i in list_holder:
list1.append(i.lower())
subs = "'"
feet = [i for i in list2 if subs in i]
subs2 = '"'
inches = [i for i in list2 if subs in i]
print([s.strip('"') for s in feet])
#print([s.replace('\', ') for s in feet])
y = 0
#for x in feet:
# height.append(feet[y])
# height.append(inches[y])
# y+=1
# print(height)
gives me the following output:
["4' 6", "4' 7", "4' 8", "4' 9", "4' 10", "4' 11", "5' 0", "5' 1", "5' 2", "5' 3", "5' 4", "5' 5", "5' 6", "5' 7", "5' 8", "5' 9", "5' 10", "5' 11", "6' 0", "6' 1", "6' 2", "6' 3", "6' 4", "6' 5", "6' 6", "6' 7", "6' 8", "6' 9", "6' 10", "6' 11", "7' 0"]
From your question, I assume this is what you want?
Anyway, the problem (as far as I could see) was simply wrong usage of the strip()
function, which expects a string (containing the part of the source string you want to strip), not just a single character.
Upvotes: 1
Reputation: 1521
Until you post your code, it's hard to figure out the bug as to why your array is improperly formatted. However, you can use the following to fix this array:
#copy paste the data into a variable as a string using triple quotes (or convert this variable to a string)
a='''['4' 6"', '4' 6"', '4' 7"', '4' 7"', '4' 8"', '4' 8"', '4' 9"', '4' 9"', '4' 10"', '4' 10"', '4' 11"', '4' 11"', '5' 0"', '5' 0"', '5' 1"', '5' 1"', '5' 2"', '5' 2"', '5' 3"', '5' 3"', '5' 4"', '5' 4"', '5' 5"', '5' 5"', '5' 6"', '5' 6"', '5' 7"', '5' 7"', '5' 8"', '5' 8"', '5' 9"', '5' 9"', '5' 10"', '5' 10"', '5' 11"', '5' 11"', '6' 0"', '6' 0"', '6' 1"', '6' 1"', '6' 2"', '6' 2"', '6' 3"', '6' 3"', '6' 4"', '6' 4"', '6' 5"', '6' 5"', '6' 6"', '6' 6"', '6' 7"', '6' 7"', '6' 8"', '6' 8"', '6' 9"', '6' 9"', '6' 10"', '6' 10"', '6' 11"', '6' 11"', '7' 0"', '7' 0"']'''
m=a.strip('[').strip(']') #remove braces
x=[]
n=m.split(',') #creaet list of elements
for i in n:
x.append(i.strip(" ").strip("'")) #remove the excessive quotes and spaces
print(x)
This above code gives me x as :
['4\' 6"', '4\' 6"', '4\' 7"', '4\' 7"', '4\' 8"', '4\' 8"', '4\' 9"', '4\' 9"', '4\' 10"', '4\' 10"', '4\' 11"', '4\' 11"', '5\' 0"', '5\' 0"', '5\' 1"', '5\' 1"', '5\' 2"', '5\' 2"', '5\' 3"', '5\' 3"', '5\' 4"', '5\' 4"', '5\' 5"', '5\' 5"', '5\' 6"', '5\' 6"', '5\' 7"', '5\' 7"', '5\' 8"', '5\' 8"', '5\' 9"', '5\' 9"', '5\' 10"', '5\' 10"', '5\' 11"', '5\' 11"', '6\' 0"', '6\' 0"', '6\' 1"', '6\' 1"', '6\' 2"', '6\' 2"', '6\' 3"', '6\' 3"', '6\' 4"', '6\' 4"', '6\' 5"', '6\' 5"', '6\' 6"', '6\' 6"', '6\' 7"', '6\' 7"', '6\' 8"', '6\' 8"', '6\' 9"', '6\' 9"', '6\' 10"', '6\' 10"', '6\' 11"', '6\' 11"', '7\' 0"', '7\' 0"']
Upvotes: 0