Reputation: 11
I want to extract only http
value from onclick
in the result below
Something like this:
http://14.63.194.48:1935/fado/fado7.stream/playlist.m3u8
from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
html = urlopen("http://smantv.net/tv/ajax.tv_channel.php")
bsObj = BeautifulSoup(html, "html.parser")
nameList = bsObj.find_all("",{"class":"spo_wc_active"})
for name in nameList:
print (name)
RESULT
<a href="#" onclick="javascript:post_chtitle('', 'http://14.63.194.48:1935/fado/fado7.stream/playlist.m3u8', '[MLB] 워싱턴 vs 애틀랜타 (한국어중계)', '');" class="spo_wc spo_h spo_wc_active">방송보기</a>
Upvotes: 1
Views: 216
Reputation: 733
So your code is
from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
html = urlopen("http://smantv.net/tv/ajax.tv_channel.php")
bsObj = BeautifulSoup(html, "html.parser")
nameList = bsObj.find_all("", {"class": "spo_wc_active"})
for name in nameList: print (name)
Which works OK.
The output is
<a class="spo_wc spo_h spo_wc_active" href="#" onclick="javascript:post_chtitle('', 'http://14.63.194.48:1935/fado/fado72.stream/playlist.m3u8', '[MLB] 밀워키 vs 샌디에이고', '');">방송보기</a>
<a class="spo_wc spo_h spo_wc_active" href="#" onclick="javascript:post_chtitle('', 'http://14.63.194.48:1935/fado/fado79.stream/playlist.m3u8', '[UEFA EL] 세비야 FC [white] vs FK 잘기리스 빌뉴스 [green]', '');">방송보기</a>
If you print the nameList (which is actually a bs4.element.ResultSet object) itself instead of iterating through it you get
[<a class="spo_wc spo_h spo_wc_active" href="#" onclick="javascript:post_chtitle('', 'http://14.63.194.48:1935/fado/fado72.stream/playlist.m3u8', '[MLB] 밀워키 vs 샌디에이고', '');">방송보기</a>, <a class="spo_wc spo_h spo_wc_active" href="#" onclick="javascript:post_chtitle('', 'http://14.63.194.48:1935/fado/fado79.stream/playlist.m3u8', '[UEFA EL] 세비야 FC [white] vs FK 잘기리스 빌뉴스 [green]', '');">방송보기</a>]
So if not empty, then you can iterate it, like
print(nameList[0])
<a class="spo_wc spo_h spo_wc_active" href="#" onclick="javascript:post_chtitle('', 'http://14.63.194.48:1935/fado/fado32.stream/playlist.m3u8', '[PAR D1] 클루브 솔 데 아메리카 vs 인데펜디엔테 FBC', '');">방송보기</a>
The same way you can get this line converted to a string. The split that string and finally strip the quotation, I'll show it in interactive python console, like
>>> parts = str(nameList[0]).split(', ')
>>> parts
['<a class="spo_wc spo_h spo_wc_active" href="#" onclick="javascript:post_chtitle(\'\'', "'http://14.63.194.48:1935/fado/fado32.stream/playlist.m3u8'", "'[PAR D1] 클루브 솔 데 아메리카\tvs\t인데펜디엔테 FBC'", '\'\');">방송보기</a>']
so parts
now is an actual list of plain strings, now you can get what you want
>>> print(parts[1].strip("'"))
http://14.63.194.48:1935/fado/fado32.stream/playlist.m3u8
of course, if you have more than one result, then you need to wrap these step in loop(s)
if you fancy another way, I've just found that you can actually get the string out of it without converting anything, like:
>>> nameList[0]['onclick']
"javascript:post_chtitle('', 'http://14.63.194.48:1935/fado/fado32.stream/playlist.m3u8', '[PAR D1] 클루브 솔 데 아메리카\tvs\t인데펜디엔테 FBC', '');"
>>> type(nameList[0]['onclick'])
<class 'str'>
so once more, this time including the quotation to the split()
>>> nameList[0]['onclick'].split("', '")[1]
'http://14.63.194.48:1935/fado/fado32.stream/playlist.m3u8'
from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
html = urlopen("http://smantv.net/tv/ajax.tv_channel.php")
bsObj = BeautifulSoup(html, "html.parser")
nameList = bsObj.find_all("", {"class": "spo_wc_active"})
url_list = []
for name in nameList:
parts = str(name).split(', ')
url = parts[1].strip("'")
url_list.append(url)
So now you have url_list
, a list of 0-n URLs which match your original pattern
You can get the items in the list iterating through it
for url in url_list:
print(url)
Upvotes: 1