Reputation: 841
I have the following string for which I want to extract data:
text_example = '\nExample text \nTECHNICAL PARTICULARS\nLength oa: ...............189.9m\nLength bp: ........176m\nBreadth moulded: .......26.4m\nDepth moulded to main deck: ....9.2m\n
For example my preferred output looks like:
LOA = 189.9
LBP = 176.0
BM = 26.4
DM = 9.2
Upvotes: 0
Views: 291
Reputation: 366
import re
text_example = '\nExample text \nTECHNICAL PARTICULARS\nLength oa: ...............189.9m\nLength bp: ........176m\nBreadth moulded: .......26.4m\nDepth moulded to main deck: ....9.2m\n'
# capture all the characters BEFORE the ':' character
variables = re.findall(r'(.*?):', text_example)
# matches all floats and integers (does not account for minus signs)
values = re.findall(r'(\d+(?:\.\d+)?)', text_example)
# zip into dictionary (this is assuming you will have the same number of results for both regex expression.
result = dict(zip(variables, values))
print(result)
--> {'Length oa': '189.9', 'Breadth moulded': '26.4', 'Length bp': '176', 'Depth moulded to main deck': '9.2'}
Upvotes: 1
Reputation: 581
You can create a regex and workaround the solution-
re.findall(r'(\\n|\n)([A-Za-z\s]*)(?:(\:\s*\.+))(\d*\.*\d*)',text_example)[2]
('\n', 'Breadth moulded', ': .......', '26.4')
Upvotes: 0