Rock73
Rock73

Reputation: 1

How do I remove certain elements from a list in python?

I'm scraping some data from a website but some of it is coming out with "\" in front of it. I tried to use this string of code but an error message occurred.

print([s.strip('\') for s in feet])    *EOL while scanning string literal
print([s.replace('\', ') for s in feet])

The code after the '\' in the first line became italicized, I have no clue what to do about this.

from lxml import html
import requests

list1 = []
height = []

user_website = "https://www.disabled-world.com/calculators-charts/height-weight.php"

page = requests.get(user_website)
tree = html.fromstring(page.content)
list2 = tree.xpath('//td/text()')

for x in list2:
    list_holder = x.split(" ")
    for i in list_holder:
        list1.append(i.lower())

subs = "'"
feet = [i for i in list2 if subs in i]

subs2 = '"'
inches = [i for i in list2 if subs in i]

print([s.strip('\') for s in feet])
print([s.replace('\', ') for s in feet])

y = 0

for x in feet:
    height.append(feet[y])
    height.append(inches[y])
    y+=1

print(height)

Upvotes: 0

Views: 112

Answers (2)

Roland Deschain
Roland Deschain

Reputation: 2850

So I tried to extract your code, and this:

from lxml import html 
import requests 

list1 = [] 
height = [] 
user_website = "https://disabled-world.com/calculators-charts/height-weight.php" 
page = requests.get(user_website) 
tree = html.fromstring(page.content) 
list2 = tree.xpath('//td/text()')

for x in list2: 
    list_holder = x.split(" ") 
    for i in list_holder: 
        list1.append(i.lower()) 
subs = "'" 
feet = [i for i in list2 if subs in i] 
subs2 = '"' 
inches = [i for i in list2 if subs in i] 
print([s.strip('"') for s in feet]) 
#print([s.replace('\', ') for s in feet]) 
y = 0 

#for x in feet: 
#    height.append(feet[y]) 
#    height.append(inches[y]) 
#    y+=1 
#    print(height)

gives me the following output:

["4' 6", "4' 7", "4' 8", "4' 9", "4' 10", "4' 11", "5' 0", "5' 1", "5' 2", "5' 3", "5' 4", "5' 5", "5' 6", "5' 7", "5' 8", "5' 9", "5' 10", "5' 11", "6' 0", "6' 1", "6' 2", "6' 3", "6' 4", "6' 5", "6' 6", "6' 7", "6' 8", "6' 9", "6' 10", "6' 11", "7' 0"]

From your question, I assume this is what you want?

Anyway, the problem (as far as I could see) was simply wrong usage of the strip() function, which expects a string (containing the part of the source string you want to strip), not just a single character.

Upvotes: 1

fireball.1
fireball.1

Reputation: 1521

Until you post your code, it's hard to figure out the bug as to why your array is improperly formatted. However, you can use the following to fix this array:

#copy paste the data into a variable as a string using triple quotes (or convert this variable to a string)
a='''['4' 6"', '4' 6"', '4' 7"', '4' 7"', '4' 8"', '4' 8"', '4' 9"', '4' 9"', '4' 10"', '4' 10"', '4' 11"', '4' 11"', '5' 0"', '5' 0"', '5' 1"', '5' 1"', '5' 2"', '5' 2"', '5' 3"', '5' 3"', '5' 4"', '5' 4"', '5' 5"', '5' 5"', '5' 6"', '5' 6"', '5' 7"', '5' 7"', '5' 8"', '5' 8"', '5' 9"', '5' 9"', '5' 10"', '5' 10"', '5' 11"', '5' 11"', '6' 0"', '6' 0"', '6' 1"', '6' 1"', '6' 2"', '6' 2"', '6' 3"', '6' 3"', '6' 4"', '6' 4"', '6' 5"', '6' 5"', '6' 6"', '6' 6"', '6' 7"', '6' 7"', '6' 8"', '6' 8"', '6' 9"', '6' 9"', '6' 10"', '6' 10"', '6' 11"', '6' 11"', '7' 0"', '7' 0"']'''

m=a.strip('[').strip(']')  #remove braces
x=[]                       
n=m.split(',')             #creaet list of elements
for i in n:
    x.append(i.strip(" ").strip("'"))  #remove the excessive quotes and spaces
print(x)

This above code gives me x as :

['4\' 6"', '4\' 6"', '4\' 7"', '4\' 7"', '4\' 8"', '4\' 8"', '4\' 9"', '4\' 9"', '4\' 10"', '4\' 10"', '4\' 11"', '4\' 11"', '5\' 0"', '5\' 0"', '5\' 1"', '5\' 1"', '5\' 2"', '5\' 2"', '5\' 3"', '5\' 3"', '5\' 4"', '5\' 4"', '5\' 5"', '5\' 5"', '5\' 6"', '5\' 6"', '5\' 7"', '5\' 7"', '5\' 8"', '5\' 8"', '5\' 9"', '5\' 9"', '5\' 10"', '5\' 10"', '5\' 11"', '5\' 11"', '6\' 0"', '6\' 0"', '6\' 1"', '6\' 1"', '6\' 2"', '6\' 2"', '6\' 3"', '6\' 3"', '6\' 4"', '6\' 4"', '6\' 5"', '6\' 5"', '6\' 6"', '6\' 6"', '6\' 7"', '6\' 7"', '6\' 8"', '6\' 8"', '6\' 9"', '6\' 9"', '6\' 10"', '6\' 10"', '6\' 11"', '6\' 11"', '7\' 0"', '7\' 0"']

Upvotes: 0

Related Questions