hanan
hanan

Reputation: 632

Scraping a javascript response through beautifulsoup, possible?

The server returns the below response, it is apparantly javascript contents and I wanted to scrape it but I am unable to do it.

Element.update("to_users2", "\n\n\n<div class="label-field-pair">\n <div class="label-field-pair11">\n <label for="student_grade">Select member\n <div class ="scrolable" >\n <div class="scroll-inside">\n <div class="hover"><a href="#" class="all" onClick="add_all_recipient('2,4')">Select all Add \n\n \n \n \n <div class="hover"><a href="#" before="Element.show('loader')" class="individual" onClick="add_recipient(2)" success="Element.hide('loader')">TestUserOne Add \n\n \n \n \n <div class="hover"><a href="#" before="Element.show('loader')" class="individual" onClick="add_recipient(4)" success="Element.hide('loader')">TestUserTwo Add \n\n \n \n \n \n \n\n\n\n");

Basically, I tried a number of possibilities like converting the response object into string and then replacing all \n and so on and then using the resulting html I also tried to use requests_html which nearly given me the expecting result but still not able to get the expected result. I expected to scrape the add_recipient(2) in the a tag and the a tag text TestUserOne in this case. in requests_html, I can do something like:

htm = '''Element.update("to_users2", "\n\n\n<div class="label-field-pair">\n <div class="label-field-pair11">\n <label for="student_grade">Select member\n <div class ="scrolable" >\n <div class="scroll-inside">\n <div class="hover"><a href="#" class="all" onClick="add_all_recipient('2,4')">Select all Add \n\n \n \n \n <div class="hover"><a href="#" before="Element.show('loader')" class="individual" onClick="add_recipient(2)" success="Element.hide('loader')">TestUserOne Add \n\n \n \n \n <div class="hover"><a href="#" before="Element.show('loader')" class="individual" onClick="add_recipient(4)" success="Element.hide('loader')">TestUserTwo Add \n\n \n \n \n \n \n\n\n\n");'''

html = HTML(html=htm)
print(html.find('a'))

and it gives the output

<Element 'a' href='\\"#\\"' before='\\"Element.show(\'loader\')\\"' class=('\\"individual\\"',) onclick='\\"add_recipient(4)\\"' success='\\"Element.hide(\'loader\')\\"'>,

Here I wanted to scrape the onclick value and hence get the a tag text like TestUserOne Add in this context.

where to from here? tried all possibilities but nothing avail. any help would be appreciated.

Upvotes: 0

Views: 285

Answers (1)

arwt
arwt

Reputation: 11

I don't think it's possible to parse JavaScript values such as that using BeautifulSoup. One solution could be to use a regular expression:

main.py

import re

resp = """
Element.update("to_users2", "\n\n\n<div class="label-field-pair">\n <div class="label-field-pair11">\n <label for="student_grade">Select member\n <div class ="scrolable" >\n <div class="scroll-inside">\n <div class="hover"><a href="#" class="all" onClick="add_all_recipient('2,4')">Select all Add \n\n \n \n \n <div class="hover"><a href="#" before="Element.show('loader')" class="individual" onClick="add_recipient(2)" success="Element.hide('loader')">TestUserOne Add \n\n \n \n \n <div class="hover"><a href="#" before="Element.show('loader')" class="individual" onClick="add_recipient(4)" success="Element.hide('loader')">TestUserTwo Add \n\n \n \n \n \n \n\n\n\n");
"""

print(re.findall(r"add_recipient\(([0-9+])\)\" success=.+>([a-zA-Z0-9\w]+) Add", resp))
$ python main.py
[('2', 'TestUserOne'), ('4', 'TestUserTwo')]

Upvotes: 1

Related Questions