Parsing Web Data in Python (Using Strings)

Question

I have a Python list from some web scraping that looks like the list below.

['
           Elementary School: HUTCHINSON 
           High School: RIVERSIDE 
           Middle School: ROLLING RIDGE 
  ... ]

My goal is to retrieve the high school name of each listing I pull. For this particular listing, the high school is RIVERSIDE. What is the best way to extract this information?

My goal is to have a variable hs = "RIVERSIDE" for this particular listing.

Side note: this data is from housing listings on realtor.com

MrNobody33 · Accepted Answer

You can try this:

ls=['
           Elementary School: HUTCHINSON 
           High School: RIVERSIDE 
           Middle School: ROLLING RIDGE 
 ']


hs=[i.strip().split(':')[1].replace('
','') for i in ls[0].split('
') if 'High School' in i][0]
print('hs =',hs)

Output:

hs =  RIVERSIDE

Or you can use re:

import re 
highschoolregex = re.compile(r'(High School)[:] (\w.+)') 
hs = highschoolregex.search(ls[0]).group(2)
print('hs =',hs)

Output:

hs =  RIVERSIDE

Parsing Web Data in Python (Using Strings)

Answers (1)

Related Questions