Reputation:
I am using the following regular expression to extract data from a file, which works fine as long as the data Am extracting contains all 3 elements of the regex, if not ( if only one is messing ) the regex just skips the data, how do I change this behavior to not skip a value if it does not match but instead fill it with 0 or null?
bC_NUMBER = 1
bS_ID = 1
bTRANSACTION_AMOUNT = 1
rC_NUMBER = r"number:\s(\d+\*+\d+).*?"
rS_ID = r"ID:\s*(\d*).*?"
rT_ID = r"ATM:\s(\w+).*?"
rT_AMOUNT = r"Total cash dispensed:\s*([a-zA-Z0-9]+)\s+([a-zA-Z0-9]+).*?"
regex = rC_NUMBER*bC_NUMBER+ rS_ID*bS_ID + rT_AMOUNT*bTRANSACTION_AMOUNT
Example Output :
[('99280*********8823', '182', '40000', 'MGA'), ('99280*********8823', '182', '40000', 'MGA')]
Desired Output :
[('99280*********8823', '182', '40000', 'MGA'),('6700*********8823', '177', 'null or 0', 'null or 0'), ('99280*********8823', '182', '40000', 'MGA')]
Upvotes: 0
Views: 117
Reputation: 627327
You can use a regex like
(?s)Card number:\s(\d+\*+\d+)(?:(?!Card number:).)*?ID:\s*(\d*)(?:(?:(?!Card number:).)*?Total cash dispensed:\s*([a-zA-Z0-9]+)\s+([a-zA-Z0-9]+))?
See the regex demo.
NOTE: 1) the .*?
is turned into a (?:(?!Card number:).)*?
tempered greedy token, 2) the last part if now optional, (?:(?:(?!Card number:).)*?Total cash dispensed:\s*([a-zA-Z0-9]+)\s+([a-zA-Z0-9]+))?
, and 3) I am using the (?s)
(in code, re.S
or re.DOTALL
) so that the .
could match any chars including line break chars.
See the Python demo:
import re
test_str = "YOUR_STRING_HERE"
bC_NUMBER = 1
bS_ID = 1
bTRANSACTION_AMOUNT = 1
rC_NUMBER = r"Card number:\s(\d+\*+\d+)"
rS_ID = r"(?:(?!Card number:).)*?ID:\s*(\d*)"
rT_ID = r"(?:(?!Card number:).)*?ATM:\s(\w+)"
rT_AMOUNT = r"(?:(?:(?!Card number:).)*?Total cash dispensed:\s*([a-zA-Z0-9]+)\s+([a-zA-Z0-9]+))?"
regex = rC_NUMBER*bC_NUMBER+ rS_ID*bS_ID + rT_AMOUNT*bTRANSACTION_AMOUNT
print( re.findall(regex, test_str, re.S) )
Output:
[('99280*********8823', '182', '40000', 'MGA'), ('6700*********8823', '177', '', ''), ('99280*********8823', '182', '40000', 'MGA')]
Upvotes: 1