IceCube
IceCube

Reputation: 425

HTML variable into python variable

I don't have mach experience with html so I hope to use the right terminology to explain myself.

I have the following html line ..

   <script type="text/javascript">
             var graph_raw_data = [{"parent_id": 844, "process_id": 236, "process_name":    "C0nw0nk Steam Patcher.exe","first_seen":
       "2355-02-21 00:00:00,183", "calls": [{"category": "system",
       "timestamp": "2355-02-21 00:00:00,193", "api": "LdrGetDllHandle"},
       {"category": "process", "timestamp": "2015-02-21 18:59:49,584",
       "api": "ExitProcess"}]}];
  </script>

this node is nested within few nodes with the following pattern:

<div class="tab-content">

How can i inject graph_raw_data into python variable - something slimier to dictionary varibale type.

Basically I need to iterate thorough all the nodes and find the desire one ? how can I do it in python ?

I take the html data with this python code:

f = urllib2.urlopen(url)
page_data = f.read()
soup = BeautifulSoup(page_data)

Upvotes: 0

Views: 54

Answers (1)

Hunger
Hunger

Reputation: 5405

Use regex to extract the string which contains the variable, then use json.loads to convert it into python variable.

import json
import re

html="""<script type="text/javascript">
             var graph_raw_data = [{"parent_id": 844, "process_id": 236, "process_name":    "C0nw0nk Steam Patcher.exe","first_seen":
       "2355-02-21 00:00:00,183", "calls": [{"category": "system",
       "timestamp": "2355-02-21 00:00:00,193", "api": "LdrGetDllHandle"},
       {"category": "process", "timestamp": "2015-02-21 18:59:49,584",
       "api": "ExitProcess"}]}];
  </script>"""

graph_raw_data=re.search(r'var graph_raw_data = (.*?);',html.replace('\n','')).group(1)
data=json.loads(graph_raw_data)
print(data)
>>>[{'parent_id': 844, 'calls': [{'timestamp': '2355-02-21 00:00:00,193', 'category': 'system', 'api': 'LdrGetDllHandle'}, {'timestamp': '2015-02-21 18:59:49,584', 'category': 'process', 'api': 'ExitProcess'}], 'process_name': 'C0nw0nk Steam Patcher.exe', 'first_seen': '2355-02-21 00:00:00,183', 'process_id': 236}]

Upvotes: 1

Related Questions