user3746017
user3746017

Reputation: 35

Iterate through specific tags in python

I want to extract text from the website and the format is like this:

<a href="#N44">Avalon</a>
<a href="#N36">Avondale</a>
<a href="#N4">Bacon Park Area</a>

How do I just select those 'a' tags with href="#N" because there are several more?

I tried creating a list to iterate through but when I try the code, it selects only one element.

loc= ['#N0', '#N1', '#N2', '#N3', '#N4', '#N5'.....'#N100']

for i in loc:
    name=soup.find('a', attrs={'href':i})    
print(name)

I get

<a href="#N44">Avalon</a>

not

<a href="#N44">Avalon</a>
<a href="#N36">Avondale</a>
<a href="#N4">Bacon Park Area</a

How about just?

Avalon
Avondale
Bacon Park Area

Thanks in advance!

Upvotes: 0

Views: 175

Answers (1)

mechanical_meat
mechanical_meat

Reputation: 169304

You're iterating over the items, but not putting them anywhere. So when you are done with your loop all that's left in name is the last item.

You can put them in a list like below, and access the .text attribute to get just the name from the tag:

names = []

for i in loc:
    names.append(soup.find('a',attrs={'href':i}).text) 

Result:

In [15]: names
Out[15]: ['Bacon Park Area', 'Avondale', 'Avalon']

If you want to leave out the first list's creation you can just do:

import re

names = [tag.text for tag in soup.find_all('a',href=re.compile(r'#N\d+'))] 

In a regular expression, the \d means digit and the + means one or more instances of.

Upvotes: 1

Related Questions