Reputation: 425
I don't have mach experience with html so I hope to use the right terminology to explain myself.
I have the following html line ..
<script type="text/javascript">
var graph_raw_data = [{"parent_id": 844, "process_id": 236, "process_name": "C0nw0nk Steam Patcher.exe","first_seen":
"2355-02-21 00:00:00,183", "calls": [{"category": "system",
"timestamp": "2355-02-21 00:00:00,193", "api": "LdrGetDllHandle"},
{"category": "process", "timestamp": "2015-02-21 18:59:49,584",
"api": "ExitProcess"}]}];
</script>
this node is nested within few nodes with the following pattern:
<div class="tab-content">
How can i inject graph_raw_data into python variable - something slimier to dictionary varibale type.
Basically I need to iterate thorough all the nodes and find the desire one ? how can I do it in python ?
I take the html data with this python code:
f = urllib2.urlopen(url)
page_data = f.read()
soup = BeautifulSoup(page_data)
Upvotes: 0
Views: 54
Reputation: 5405
Use regex to extract the string which contains the variable, then use json.loads
to convert it into python variable.
import json
import re
html="""<script type="text/javascript">
var graph_raw_data = [{"parent_id": 844, "process_id": 236, "process_name": "C0nw0nk Steam Patcher.exe","first_seen":
"2355-02-21 00:00:00,183", "calls": [{"category": "system",
"timestamp": "2355-02-21 00:00:00,193", "api": "LdrGetDllHandle"},
{"category": "process", "timestamp": "2015-02-21 18:59:49,584",
"api": "ExitProcess"}]}];
</script>"""
graph_raw_data=re.search(r'var graph_raw_data = (.*?);',html.replace('\n','')).group(1)
data=json.loads(graph_raw_data)
print(data)
>>>[{'parent_id': 844, 'calls': [{'timestamp': '2355-02-21 00:00:00,193', 'category': 'system', 'api': 'LdrGetDllHandle'}, {'timestamp': '2015-02-21 18:59:49,584', 'category': 'process', 'api': 'ExitProcess'}], 'process_name': 'C0nw0nk Steam Patcher.exe', 'first_seen': '2355-02-21 00:00:00,183', 'process_id': 236}]
Upvotes: 1