Slickmind
Slickmind

Reputation: 444

extracting certain values from a text file in python

I have a text file in the below format and I have to extract all range of motion and Location values. In some files, the value is given in the next line and in some, it is not given

File1.txt:

Functional Assessment: Patient currently displays the following functional 
limitations and would benefit from treatment to maximize functional use and 
pain reduction: Range of Motion: limited . ADLs: limited . Gait: limited . 
Stairs: limited . Squatting: limited . Work participation status: limited . 
Current Status: The patient's current status is improving. 

Location: Right side 

Expected output: limited | Right side

File2.txt:

Functional Assessment: Patient currently displays the following functional 
limitations and would benefit from treatment to maximize functional use and 
pain reduction: 
Range of Motion: 
painful 
and
limited

Strength: 
limited 

Expected output: painful and limited | Not given

This is the code which I am trying:

if "Functional Assessment:" in line:
    result=str(line.rsplit('Functional Assessment:'))
    romvalue = result.rsplit('Range of Motion:')[-1].split()[0]
    outputfile.write(romvalue)
    partofbody = result.rsplit('Location:')[-1].split()[0]
    outputfile.write(partofbody)

I am not getting the output which I want with this code. Can someone please help.

Upvotes: 1

Views: 720

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626689

You may collect all lines after a line that starts with Functional Assessment:, join them and use the following regex:

(?sm)\b(Location|Range of Motion):\s*([^\W_].*?)\s*(?=(?:\.\s*)?[^\W\d_]+:|\Z)

See the regex demo.

Details

  • (?sm) - re.S and re.M modifiers
  • \b - word boundary
  • (Location|Range of Motion) - Group 1: either Location or Range of Motion
  • :\s* - a colon and 0+ whitespaces
  • ([^\W_].*?) - Group 2:
  • \s* - 0+ whitespaces
  • (?=(?:\.\s*)?[^\W\d_]+:|\Z) - a positive lookahead that, immediately to the right of the current location, requires
    • (?:\.\s*)? - an optional sequence of . and 0+ whitespaces
    • [^\W\d_]+: - 1+ letters followed with :
    • | - or
    • \Z - end of string.

Here is a Python demo:

reg = re.compile(r'\b(Location|Range of Motion):\s*([^\W_].*?)\s*(?=(?:\.\s*)?[^\W\d_]+:|\Z)', re.S | re.M)
for file in files:
    flag = False
    tmp = ""
    for line in file.splitlines():
        if line.startswith("Functional Assessment:"):
            tmp = tmp + line + "\n"
            flag = not flag
        elif flag:
            tmp = tmp + line + "\n"
    print(dict(list(reg.findall(tmp))))

Output (for the two texts you posted):

{'Location': 'Right side', 'Range of Motion': 'limited'}
{'Range of Motion': 'painful \nand\nlimited'}

Upvotes: 3

Related Questions