Reputation: 724
I have a Python list from some web scraping that looks like the list below.
['\n Elementary School: HUTCHINSON \n High School: RIVERSIDE \n Middle School: ROLLING RIDGE \n ... ]
My goal is to retrieve the high school name of each listing I pull. For this particular listing, the high school is RIVERSIDE. What is the best way to extract this information?
My goal is to have a variable hs = "RIVERSIDE"
for this particular listing.
Side note: this data is from housing listings on realtor.com
Upvotes: 2
Views: 52
Reputation: 6483
You can try this:
ls=['\n Elementary School: HUTCHINSON \n High School: RIVERSIDE \n Middle School: ROLLING RIDGE \n ']
hs=[i.strip().split(':')[1].replace('\n','') for i in ls[0].split('\n') if 'High School' in i][0]
print('hs =',hs)
Output:
hs = RIVERSIDE
Or you can use re
:
import re
highschoolregex = re.compile(r'(High School)[:] (\w.+)')
hs = highschoolregex.search(ls[0]).group(2)
print('hs =',hs)
Output:
hs = RIVERSIDE
Upvotes: 1