alias micheal
alias micheal

Reputation: 141

How to parse HAR file to extract text content?

I saved my network data in a har file. Now I want to extract the whole dictionary of content that contains specific word as an indicator to save that dictionary to an array. There are multiple similar dicts in the har file that contain that value and I want to create an array of all the responses.

I am fairly new to python(and coding in general), explainlikeimfive kind explanation will greatly help me.

Upvotes: 14

Views: 24233

Answers (2)

Anwarvic
Anwarvic

Reputation: 12992

You can use haralyzer module. You can install it easily using pip like so:

pip install haralyzer

The following code uses this sample har file:

>>> import json
>>> from haralyzer import HarParser, HarPage
>>>
>>> with open('sample.har', 'r', encoding='utf-8') as f:
...     har_parser = HarParser(json.loads(f.read()))
>>>
>>> data = har_parser.har_data
>>> type(data)
<class 'dict'>
>>>
>>> data.keys()
dict_keys(['version', 'creator', 'pages', 'entries'])
>>>
>>> har_parser.har_data["pages"]
[{'startedDateTime': '2013-08-24T20:16:16.997Z', 'id': 'page_1', 'title': 'http://ericduran.github.io/chromeHAR/', 'pageTimings': {'onContentLoad': 317, 'onLoad': 406}}]

For more info, check the official GitHub repository.

Upvotes: 17

MrName
MrName

Reputation: 2529

Tacking on to the answer from Anwarvic, entries in the HAR file that have a text-based content type contain the actual content in the key entry -> response -> content -> text. So, here is an example printing the content of all such entries.

.... initialize har parser as per documentation ....

for page in har_parser.pages:
    for entry in page.entries:
        # Need to be careful accessing the text property, it will not exist for non text-based responses.
        print(entry['response']['content'].get('text', ''))

From there you can use in or a regex to see if the response text of the entry matches the text you are looking for.

Upvotes: 1

Related Questions