Infinitexistence
Infinitexistence

Reputation: 155

Regex to match string followed by varying formats

I am working with data from a DB which produces information on transactions.

The problem is that transactions can have any number of related attributes, and transaction details will be replicated with a new line for each attribute.

In the format of:

[Transaction ID] [tab] [Attribute name] [tab] [Attribute value] [tab] [date]

Example:

11111    Amount    12000
11111    Reference    101010
11111    Operator    John
11111    Subject    Credit
11111    Notes    XXXXXXXX
11112    Amount    75000
11112    Reference    202020
11112    Operator    Will

I am trying to identify a REGEX expression for EACH attribute which will match on the following logic;

"Amount" - followed by TAB - followed by variable length number - followed by TAB

"Reference" - followed by TAB - followed by variable length number - followed by TAB

"Operator" - followed by TAB - followed by variable length string - followed by TAB

"Subject" - followed by TAB - followed by variable length string- followed by TAB

"Notes" - followed by TAB - followed by variable length string- followed by TAB

Upvotes: 1

Views: 100

Answers (2)

Marc Lambrichs
Marc Lambrichs

Reputation: 2882

This answer applies more to reading all attributes that belong to the same transaction id. Take a look at regex101.com

(?s)                                    // dot matches newline
(?<tid>\d+)                             // transactionid 
\t
(?:Amount\t(?<amount>\d+))              // amount
.\1\t                                   // newline, transactionid, tab
(?:Reference\t(?<ref>\d+))              // reference
.\1\t                                   // newline, transactionid, tab
(?:Operator\t(?<ope>\w+))               // operator
(?:.\1\t(?:Subject\t(?<sub>\w+)))?      // possible subject
(?:.\1\t(?:Notes\t(?<not>\w+)))?        // possible notes
(?!\1)                                  // negative lookahead

For a simple explanation, you want to read attributes until the transaction id is a different one.

Upvotes: 1

Marc Lambrichs
Marc Lambrichs

Reputation: 2882

A regex like this

(?<transactionid>\d+)\t(?<attribute>Amount|Reference|Operator|Subject|Notes)\t(?<value>\w+)

will do.

Look at regex101.com

Explanation:

(?<transactionid>\d+)                                   // transaction id
\t                                                      // followed by tab
(?<attribute>Amount|Reference|Operator|Subject|Notes)   // attribute
\t                                                      // followed by tab
(?<value>\w+)                                           // value

Upvotes: 0

Related Questions