D_usv
D_usv

Reputation: 433

Issues using Regex on href with a tag using BeautifulSoup

Trying to extract text from a tag based on href containing a certain string, below is part of my sample code:

Experience = soup.find_all(id='background-experience-container')

Exp = {}

for element in Experience:
    Exp['Experience'] = {}


for element in Experience:
    role = element.find(href=re.compile("title").get_text()
    Exp['Experience']["Role"] = role


for element in Experience:
    company = element.find(href=re.compile("exp-company-name").get_text()
    Exp['Experience']['Company'] = company

It doesn't like the syntax for how I've defined the Exp['outer_key']['inner_key'] = value it is returning SyntaxError.

I'm trying to buld a Dict.dict which contains info on role and company, will also look to include dates for each but haven't got that far yet.

Can anyone spot any glaringly obvious mistakes in my code?

Really appreciate any help with this!

Upvotes: 1

Views: 195

Answers (1)

furas
furas

Reputation: 142631

find_all can return many values (even if you search by id) so better use list to keep all values - Exp = [].

Experience = soup.find_all(id='background-experience-container')

# create empty list
Exp = []

for element in Experience:
    # create empty dictionary
    dic = {}

    # add elements to dictionary
    dic['Role'] = element.find(href=re.compile("title")).get_text()
    dic['Company'] = element.find(href=re.compile("exp-company-name")).get_text()

    # add dictionary to list
    Exp.append(dic)

# display

print(Exp[0]['Role'])
print(Exp[0]['Company'])

print(Exp[1]['Role'])
print(Exp[1]['Company'])

# or

for x in Exp:
    print(x['Role'])
    print(x['Company'])

if you sure that find_all gives you only one element (and you need key 'Experience') then you can do

Experience = soup.find_all(id='background-experience-container')

# create main dictionary
Exp = {}

for element in Experience:
    # create empty dictionary
    dic = {}

    # add elements to dictionary
    dic['Role'] = element.find(href=re.compile("title")).get_text()
    dic['Company'] = element.find(href=re.compile("exp-company-name")).get_text()

    # add dictionary to main dictionary
    Exp['Experience'] = dic

# display

print(Exp['Experience']['Role'])
print(Exp['Experience']['Company'])

or

Experience = soup.find_all(id='background-experience-container')

# create main dictionary
Exp = {}

for element in Experience:
    Exp['Experience'] = {
       'Role': element.find(href=re.compile("title")).get_text()
       'Company': element.find(href=re.compile("exp-company-name")).get_text()
    }

# display

print(Exp['Experience']['Role'])
print(Exp['Experience']['Company'])

Upvotes: 1

Related Questions