user1719826
user1719826

Reputation:

Python Rearrange & remove character from html page title

I'm running Python 2.7.11 | on Windows 10 using beautifulsoup4 and lxml.

import urllib2
import re
from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen("http://www.daisuki.net/us/en/anime/watch.GUNDAMUNICORNRE0096.13142.html"), "lxml")
Name = soup.title.string

print(Name.replace('#', ""))

Output:

01 DEPARTURE 0096 - MOBILE SUIT GUNDAM UNICORN RE:0096 - DAISUKI

Desired Output:

MOBILE SUIT GUNDAM UNICORN RE:0096 - 01 DEPARTURE 0096

How would I go about removing the "- DAISUKI" at the end and re order the string?

Upvotes: 0

Views: 46

Answers (2)

taesu
taesu

Reputation: 4580

Hacky solution incoming:

Name = "01 DEPARTURE 0096 - MOBILE SUIT GUNDAM UNICORN RE:0096 - DAISUKI"
print ("- ".join(reversed(Name.split('-')[:2]))).strip()

Upvotes: 1

alecxe
alecxe

Reputation: 474141

Split by - and rearrange parts of the title:

>>> import urllib2
>>> from bs4 import BeautifulSoup
>>> 
>>> soup = BeautifulSoup(urllib2.urlopen("http://www.daisuki.net/us/en/anime/watch.GUNDAMUNICORNRE0096.13142.html"), "lxml")
>>> Name = soup.title.string
>>> 
>>> " - ".join(Name.replace('#', "").split(" - ")[1::-1])
u'MOBILE SUIT GUNDAM UNICORN RE:0096 - 01 DEPARTURE 0096'

Upvotes: 1

Related Questions