Reputation: 444
I have a text file in the below format and I have to extract all range of motion and Location values. In some files, the value is given in the next line and in some, it is not given
File1.txt:
Functional Assessment: Patient currently displays the following functional
limitations and would benefit from treatment to maximize functional use and
pain reduction: Range of Motion: limited . ADLs: limited . Gait: limited .
Stairs: limited . Squatting: limited . Work participation status: limited .
Current Status: The patient's current status is improving.
Location: Right side
Expected output: limited
| Right side
File2.txt:
Functional Assessment: Patient currently displays the following functional
limitations and would benefit from treatment to maximize functional use and
pain reduction:
Range of Motion:
painful
and
limited
Strength:
limited
Expected output: painful and limited
| Not given
This is the code which I am trying:
if "Functional Assessment:" in line:
result=str(line.rsplit('Functional Assessment:'))
romvalue = result.rsplit('Range of Motion:')[-1].split()[0]
outputfile.write(romvalue)
partofbody = result.rsplit('Location:')[-1].split()[0]
outputfile.write(partofbody)
I am not getting the output which I want with this code. Can someone please help.
Upvotes: 1
Views: 720
Reputation: 626689
You may collect all lines after a line that starts with Functional Assessment:
, join them and use the following regex:
(?sm)\b(Location|Range of Motion):\s*([^\W_].*?)\s*(?=(?:\.\s*)?[^\W\d_]+:|\Z)
See the regex demo.
Details
(?sm)
- re.S
and re.M
modifiers\b
- word boundary(Location|Range of Motion)
- Group 1: either Location
or Range of Motion
:\s*
- a colon and 0+ whitespaces([^\W_].*?)
- Group 2:\s*
- 0+ whitespaces(?=(?:\.\s*)?[^\W\d_]+:|\Z)
- a positive lookahead that, immediately to the right of the current location, requires
(?:\.\s*)?
- an optional sequence of .
and 0+ whitespaces[^\W\d_]+:
- 1+ letters followed with :
|
- or\Z
- end of string.Here is a Python demo:
reg = re.compile(r'\b(Location|Range of Motion):\s*([^\W_].*?)\s*(?=(?:\.\s*)?[^\W\d_]+:|\Z)', re.S | re.M)
for file in files:
flag = False
tmp = ""
for line in file.splitlines():
if line.startswith("Functional Assessment:"):
tmp = tmp + line + "\n"
flag = not flag
elif flag:
tmp = tmp + line + "\n"
print(dict(list(reg.findall(tmp))))
Output (for the two texts you posted):
{'Location': 'Right side', 'Range of Motion': 'limited'}
{'Range of Motion': 'painful \nand\nlimited'}
Upvotes: 3