Reputation: 21
I am using BeautifulSoup to webscrape job listings on a career page. I am having trouble just printing out the information I need.
This is was the HTML looks like
<ul class="list-group">
<li class="list-group-item">
<h4 class="list-group-item-heading">
<a href="http://careers.steelseries.com/apply/3LXwyjYOrb/Customer-Experience-Specialist">
Customer Experience Specialist </a>
</h4>
<ul class="list-inline list-group-item-text">
<li><i class="fa fa-map-marker"></i>Chicago, IL</li>
<li><i class="fa fa-sitemap"></i>Operations</li>
</ul>
What I want it to print out is
Customer Experience Specialist
Chicago, IL
Operations
--------------
The code I tried is this:
section = soup.find_all('div', class_='col col-xs-7 jobs-list')
for elem in section:
wrappers = elem.find('ul').get_text()
print(wrappers)
But what that does is print it for me with too many new lines and spaces as so:
Customer Experience Specialist
Chicago, IL
Operations
Keep in mind there are also like 4 empty lines above the job title and another new line after 'Operations'
Upvotes: 1
Views: 41
Reputation: 75
Try this:
sections = soup.find_all('div', class_='col col-xs-7 jobs-list')
sections = [section for section in sections.split("\n") if section and section != " "]
print("\n".join(sections))
Regards!
Upvotes: 1
Reputation: 411
After get_text() function add rstrip() to remove all trailing newlines .This removes all trailing whitespace, not just a single newline.
Otherwise, if there is only one line in the string S, use S.splitlines()[0].
Upvotes: 0