extracting certain values from a text file in python

Question

I have a text file in the below format and I have to extract all range of motion and Location values. In some files, the value is given in the next line and in some, it is not given

File1.txt:

Functional Assessment: Patient currently displays the following functional 
limitations and would benefit from treatment to maximize functional use and 
pain reduction: Range of Motion: limited . ADLs: limited . Gait: limited . 
Stairs: limited . Squatting: limited . Work participation status: limited . 
Current Status: The patient's current status is improving. 

Location: Right side

Expected output: limited | Right side

File2.txt:

Functional Assessment: Patient currently displays the following functional 
limitations and would benefit from treatment to maximize functional use and 
pain reduction: 
Range of Motion: 
painful 
and
limited

Strength: 
limited

Expected output: painful and limited | Not given

This is the code which I am trying:

if "Functional Assessment:" in line:
    result=str(line.rsplit('Functional Assessment:'))
    romvalue = result.rsplit('Range of Motion:')[-1].split()[0]
    outputfile.write(romvalue)
    partofbody = result.rsplit('Location:')[-1].split()[0]
    outputfile.write(partofbody)

I am not getting the output which I want with this code. Can someone please help.

Wiktor Stribiżew · Accepted Answer

You may collect all lines after a line that starts with Functional Assessment:, join them and use the following regex:

(?sm)\b(Location|Range of Motion):\s*([^\W_].*?)\s*(?=(?:\.\s*)?[^\W\d_]+:|\Z)

See the regex demo.

Details

(?sm) - re.S and re.M modifiers
\b - word boundary
(Location|Range of Motion) - Group 1: either Location or Range of Motion
:\s* - a colon and 0+ whitespaces
([^\W_].*?) - Group 2:
\s* - 0+ whitespaces
(?=(?:\.\s*)?[^\W\d_]+:|\Z) - a positive lookahead that, immediately to the right of the current location, requires
- (?:\.\s*)? - an optional sequence of . and 0+ whitespaces
- [^\W\d_]+: - 1+ letters followed with :
- | - or
- \Z - end of string.

Here is a Python demo:

reg = re.compile(r'\b(Location|Range of Motion):\s*([^\W_].*?)\s*(?=(?:\.\s*)?[^\W\d_]+:|\Z)', re.S | re.M)
for file in files:
    flag = False
    tmp = ""
    for line in file.splitlines():
        if line.startswith("Functional Assessment:"):
            tmp = tmp + line + "
"
            flag = not flag
        elif flag:
            tmp = tmp + line + "
"
    print(dict(list(reg.findall(tmp))))

Output (for the two texts you posted):

{'Location': 'Right side', 'Range of Motion': 'limited'}
{'Range of Motion': 'painful 
and
limited'}

extracting certain values from a text file in python

Answers (1)

Related Questions