mbvee
mbvee

Reputation: 391

Python String Extraction from text file

I have written a python script that will make a call to a server and fetch the response. While making a call to the server, it will pass few values in the body of the request. This value is supposed to be fetched by reading a text file. The text file sample is given below.

My text file sample:


Host: localhost:8080
Connection: keep-alive
.....
.....
{"token":"abcdefhutryskslkslksslslks=="}POST /fill/entry/login HTTP/1.1

Host: localhost:8080
Connection: keep-alive
.....
.....
{"value":"abcdefghijklmnopqrstuvwxyz",
 "pass":"123456789zxcvbnmljhgfds",
 "token":"abcdefghijklmnopqrstuvwxyz=="}POST /fill/health HTTP/1.1

Here, if you can observe, I get different responses. I need to capture the string that starts with {"value" and ends with "} (the second part of the response as seen in the sample).

On searching in stack overflow, I came across scenarios where they extract the string but however they have a definite start point and a definite end point. In my case, even though the start point can be identified uniquely using the search string " {"url ", the end point cannot be identified as the text file contains multiple other parentheses as well.

Any suggestions/pointers on fetching the specific part of the sting from the text file(as stated above) will be really helpful.

Upvotes: 1

Views: 176

Answers (2)

Juan Diego Godoy Robles
Juan Diego Godoy Robles

Reputation: 14945

A re example from the interpreter:

>>> with open('file') as f:
...    raw = f.read()
>>> 
>>> import re
>>> pat = re.compile(r'{"value":[^{]+}')
>>> pat.findall(raw)
['{"value":"abcdefghijklmnopqrstuvwxyz",\n "pass":"123456789zxcvbnmljhgfds",\n "token":"abcdefghijklmnopqrstuvwxyz=="}']
>>> pat.search(raw).group()
'{"value":"abcdefghijklmnopqrstuvwxyz",\n "pass":"123456789zxcvbnmljhgfds",\n "token":"abcdefghijklmnopqrstuvwxyz=="}'

Upvotes: 2

Haroldo_OK
Haroldo_OK

Reputation: 7230

If your file isn't very big, you can read the entire text into a string by using file.readlines(), then use the regular expression library to extract the required parts.

Upvotes: 1

Related Questions