Reputation: 117
I have the following already extracted from web page:
<a class="Directory-listLink" data-ya-track="todirectory" href="united-states/in">Indiana</a>,
<a class="Directory-listLink" data-ya-track="todirectory" href="united-states/ia">Iowa</a>,
<a class="Directory-listLink" data-ya-track="todirectory" href="united-states/ks">Kansas</a>,
<a class="Directory-listLink" data-ya-track="todirectory" href="united-states/ky">Kentucky</a>,
I only want the href="united-states/il" part of each extracted. Currently I am trying something like this:
for state in soup_state.find('a',href=True):
print(state['href'])
I continually receive the error:
AttributeError: ResultSet object has no attribute 'find'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
I want this to be ran in a for loop so I could get each state's url extracted, but am currently unable
Upvotes: 1
Views: 84
Reputation: 2904
You can use a regular expression to find these contents.
import re
lines = ['<a class="Directory-listLink" data-ya-track="todirectory" href="united-states/in">Indiana</a>',
'<a class="Directory-listLink" data-ya-track="todirectory" href="united-states/ia">Iowa</a>',
'<a class="Directory-listLink" data-ya-track="todirectory" href="united-states/ks">Kansas</a>',
'<a class="Directory-listLink" data-ya-track="todirectory" href="united-states/ky">Kentucky</a>']
for l in lines:
print(re.search('href="[^"]*"',l).group())
This will give the output:
href="united-states/in"
href="united-states/ia"
href="united-states/ks"
href="united-states/ky"
Upvotes: 1
Reputation: 24940
I'm not sure how you got to soup_state
, but try:
for state in soup_state:
print(state['href'])
and see if it solves the problem.
Upvotes: 2