324
324

Reputation: 724

Parsing Web Data in Python (Using Strings)

I have a Python list from some web scraping that looks like the list below.

['\n           Elementary School: HUTCHINSON \n           High School: RIVERSIDE \n           Middle School: ROLLING RIDGE \n  ... ] 

My goal is to retrieve the high school name of each listing I pull. For this particular listing, the high school is RIVERSIDE. What is the best way to extract this information?

My goal is to have a variable hs = "RIVERSIDE" for this particular listing.

Side note: this data is from housing listings on realtor.com

Upvotes: 2

Views: 52

Answers (1)

MrNobody33
MrNobody33

Reputation: 6483

You can try this:

ls=['\n           Elementary School: HUTCHINSON \n           High School: RIVERSIDE \n           Middle School: ROLLING RIDGE \n ']


hs=[i.strip().split(':')[1].replace('\n','') for i in ls[0].split('\n') if 'High School' in i][0]
print('hs =',hs)

Output:

hs =  RIVERSIDE

Or you can use re:

import re 
highschoolregex = re.compile(r'(High School)[:] (\w.+)') 
hs = highschoolregex.search(ls[0]).group(2)
print('hs =',hs)

Output:

hs =  RIVERSIDE

Upvotes: 1

Related Questions