Reputation: 11

python scraping extract onclick attribute

I want to extract only http value from onclick in the result below

Something like this:

http://14.63.194.48:1935/fado/fado7.stream/playlist.m3u8

from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
html = urlopen("http://smantv.net/tv/ajax.tv_channel.php")
bsObj = BeautifulSoup(html, "html.parser")

nameList = bsObj.find_all("",{"class":"spo_wc_active"})
for name in nameList:
print (name)

RESULT

<a href="#" onclick="javascript:post_chtitle('', 'http://14.63.194.48:1935/fado/fado7.stream/playlist.m3u8', '[MLB] 워싱턴 vs 애틀랜타 (한국어중계)', '');" class="spo_wc spo_h spo_wc_active">방송보기</a>

Upvotes: 1

Answers (1)

Gergely M

Reputation: 733

So your code is

from urllib.request import urlopen
from bs4 import BeautifulSoup
import re

html = urlopen("http://smantv.net/tv/ajax.tv_channel.php")
bsObj = BeautifulSoup(html, "html.parser")

nameList = bsObj.find_all("", {"class": "spo_wc_active"})
for name in nameList: print (name)

Which works OK.

The output is

<a class="spo_wc spo_h spo_wc_active" href="#" onclick="javascript:post_chtitle('', 'http://14.63.194.48:1935/fado/fado72.stream/playlist.m3u8', '[MLB]  밀워키 vs 샌디에이고', '');">방송보기</a>
<a class="spo_wc spo_h spo_wc_active" href="#" onclick="javascript:post_chtitle('', 'http://14.63.194.48:1935/fado/fado79.stream/playlist.m3u8', '[UEFA EL]  세비야 FC [white] vs FK 잘기리스 빌뉴스 [green]', '');">방송보기</a>

If you print the nameList (which is actually a bs4.element.ResultSet object) itself instead of iterating through it you get

[<a class="spo_wc spo_h spo_wc_active" href="#" onclick="javascript:post_chtitle('', 'http://14.63.194.48:1935/fado/fado72.stream/playlist.m3u8', '[MLB]  밀워키 vs 샌디에이고', '');">방송보기</a>, <a class="spo_wc spo_h spo_wc_active" href="#" onclick="javascript:post_chtitle('', 'http://14.63.194.48:1935/fado/fado79.stream/playlist.m3u8', '[UEFA EL]  세비야 FC [white]    vs FK 잘기리스 빌뉴스 [green]', '');">방송보기</a>]

So if not empty, then you can iterate it, like

print(nameList[0])

<a class="spo_wc spo_h spo_wc_active" href="#" onclick="javascript:post_chtitle('', 'http://14.63.194.48:1935/fado/fado32.stream/playlist.m3u8', '[PAR D1]  클루브 솔 데 아메리카    vs  인데펜디엔테 FBC', '');">방송보기</a>

The same way you can get this line converted to a string. The split that string and finally strip the quotation, I'll show it in interactive python console, like

>>> parts = str(nameList[0]).split(', ')
>>> parts
['<a class="spo_wc spo_h spo_wc_active" href="#" onclick="javascript:post_chtitle(\'\'', "'http://14.63.194.48:1935/fado/fado32.stream/playlist.m3u8'", "'[PAR D1]  클루브 솔 데 아메리카\tvs\t인데펜디엔테 FBC'", '\'\');">방송보기</a>']

so parts now is an actual list of plain strings, now you can get what you want

>>> print(parts[1].strip("'"))
http://14.63.194.48:1935/fado/fado32.stream/playlist.m3u8

of course, if you have more than one result, then you need to wrap these step in loop(s)

if you fancy another way, I've just found that you can actually get the string out of it without converting anything, like:

>>> nameList[0]['onclick']
"javascript:post_chtitle('', 'http://14.63.194.48:1935/fado/fado32.stream/playlist.m3u8', '[PAR D1]  클루브 솔 데 아메리카\tvs\t인데펜디엔테 FBC', '');"
>>> type(nameList[0]['onclick'])
<class 'str'>

so once more, this time including the quotation to the split()

>>> nameList[0]['onclick'].split("', '")[1]
'http://14.63.194.48:1935/fado/fado32.stream/playlist.m3u8'

Complete code with loop

from urllib.request import urlopen
from bs4 import BeautifulSoup
import re

html = urlopen("http://smantv.net/tv/ajax.tv_channel.php")
bsObj = BeautifulSoup(html, "html.parser")

nameList = bsObj.find_all("", {"class": "spo_wc_active"})

url_list = []
for name in nameList:
    parts = str(name).split(', ')
    url = parts[1].strip("'")
    url_list.append(url)

So now you have url_list, a list of 0-n URLs which match your original pattern

You can get the items in the list iterating through it

for url in url_list:
    print(url)

Upvotes: 1

python scraping extract onclick attribute

Answers (1)

Complete code with loop

Related Questions