mr. one
mr. one

Reputation: 3

How to remove big spaces in my scraped texts?

I am trying to remove big spaces from the code result:

from bs4 import BeautifulSoup
import requests


url = 'https://www.rucoyonline.com/characters/Something' 
response = requests.get(url)
print(response.status_code)

soup = BeautifulSoup(response.text, 'html.parser')

table = soup.find('table', class_ = 'character-table table table-bordered')
print(table.get_text())

Result after running code :

Character Information




Name
Something


Level
28


Last online

                    about 6 years ago



Born
September 03, 2016


string() is not working, I think it's because beautifulsoup

Upvotes: 0

Views: 59

Answers (3)

rafathasan
rafathasan

Reputation: 572

One line answer:

print("\n".join([s for s in table.get_text().split("\n") if s]))

Output:

Character Information
Name
Something
Level
28
Last online
                    about 6 years ago
Born
September 03, 2016

And to remove trailing and leading spaces

print("\n".join([s.strip() for s in table.get_text().split("\n") if s]))

Output:

Character Information
Name
Something
Level
28
Last online
about 6 years ago
Born
September 03, 2016

Alternatively you can utilize BeautifulSoup's get_text() to do the same:

print(table.get_text("\n", strip=True))

Output:

Character Information
Name
Something
Level
28
Last online
about 6 years ago
Born
September 03, 2016

Upvotes: 1

HedgeHog
HedgeHog

Reputation: 25196

There is no need of regex or join() of list comprehension results - Simply use standard parameters of get_text():

table.get_text('\n',strip=True)

Example

from bs4 import BeautifulSoup
import requests

url = 'https://www.rucoyonline.com/characters/Something' 
response = requests.get(url)
print(response.status_code)

soup = BeautifulSoup(response.text, 'html.parser')

table = soup.find('table', class_ = 'character-table table table-bordered')
print(table.get_text('\n',strip=True))

Output

Character Information
Name
Something
Level
28
Last online
about 6 years ago
Born
September 03, 2016

Upvotes: 0

Rahul K P
Rahul K P

Reputation: 16081

Since you are using BeautifulSoup. You can do this,

table_values = [item.text.strip() for item in table.find_all('tr')]
for item in table_values:
    print(item.replace('\n', ''))

Output

Character Information
NameSomething
Level28
Last online                    about 6 years ago
BornSeptember 03, 2016

Upvotes: 0

Related Questions