Reputation: 65
I am trying to do some transformation and I’m stuck. Here goes the problem description.
Below is the pipe delimited file. I have masked data!
AccountancyNumber|AccountancyNumberExtra|Amount|ApprovedBy|BranchCurrency|BranchGuid|BranchId|BranchName|CalculatedCurrency|CalculatedCurrencyAmount|CalculatedCurrencyVatAmount|ControllerBy|Country|Currency|CustomFieldEnabled|CustomFieldGuid|CustomFieldName|CustomFieldRequired|CustomFieldValue|Date|DateApproved|DateControlled|Email|EnterpriseNumber|ExpenseAccountGuid|ExpenseAccountName|ExpenseAccountStatus|ExpenseGuid|ExpenseReason|ExpenseStatus|ExternalId|GroupGuid|GroupId|GroupName|IBAN|Image|IsInvoice|MatchStatus|Merchant|MerchantEnterpriseNumber|Note|OwnerShip|PaymentMethod|PaymentMethodGuid|PaymentMethodName|ProjectGuid|ProjectId|ProjectName|Reimbursable|TravellerId|UserGUID|VatAmount|VatPercentage|XpdReference|VatCode|FileName|CreateTstamp
61470003||30.00|null|EUR|168fcea9-17d4-45a1-8b6f-bfb249cdbea6|BEL|BEL|USD,INR,EUR|35.20,2420.11,30.00|null,null,null|null|BE|EUR|true|0d4b767b-0988-47e8-9144-05e607169284|careertitle|false|FE|2018-07-24T00:00:00|null|null|[email protected]||c32f03c6-31df-4fd8-8cc2-1c5f3a580aad|Meals - In Office|true|781d10d2-2f3b-43bc-866e-a653fefacbbe||Approved|70926|40ac7117-c7e2-42ea-b34f-96330c9380b6|BEL-FSP-Users|BEL-FSP-Users|||false|None|in office meal #1|||Personal|Cash|1ee44666-f4c7-44b3-acd3-8ecd7127480a|Cash|2cb4ccb7-634d-4386-af43-b4572ec72098|00AA06|00AA06|true||6c5a835f-5152-46db-923a-3ebd08c7dad3|null|null|XPD012245802||1820711.xml|2018-08-07 05:42:10.46
In this file, we have got CalculatedCurrency
field where we have multiple values delimited by a comma. The file also has field CalculatedCurrencyAmount
which too has multiple values delimited by a comma. But I need to pick up only that currency value from CalculatedCurrency
field which belongs to
BranchCurrency
(another field in the file) and of course corresponding CalculatedCurrencyAmount
for that Currency.
Required output : -
AccountancyNumber|AccountancyNumberExtra|Amount|ApprovedBy|BranchCurrency|BranchGuid|BranchId|BranchName|CalculatedCurrency|CalculatedCurrencyAmount|CalculatedCurrencyVatAmount|ControllerBy|Country|Currency|CustomFieldEnabled|CustomFieldGuid|CustomFieldName|CustomFieldRequired|CustomFieldValue|Date|DateApproved|DateControlled|Email|EnterpriseNumber|ExpenseAccountGuid|ExpenseAccountName|ExpenseAccountStatus|ExpenseGuid|ExpenseReason|ExpenseStatus|ExternalId|GroupGuid|GroupId|GroupName|IBAN|Image|IsInvoice|MatchStatus|Merchant|MerchantEnterpriseNumber|Note|OwnerShip|PaymentMethod|PaymentMethodGuid|PaymentMethodName|ProjectGuid|ProjectId|ProjectName|Reimbursable|TravellerId|UserGUID|VatAmount|VatPercentage|XpdReference|VatCode|FileName|CreateTstamp|ActualCurrency|ActualAmount 61470003||30.00|null|EUR|168fcea9-17d4-45a1-8b6f-bfb249cdbea6|BEL|BEL|USD,INR,EUR|35.20,2420.11,30.00|null,null,null|null|BE|EUR|true|0d4b767b-0988-47e8-9144-05e607169284|careertitle|false|FE|2018-07-24T00:00:00|null|null|[email protected]||c32f03c6-31df-4fd8-8cc2-1c5f3a580aad|Meals - In Office|true|781d10d2-2f3b-43bc-866e-a653fefacbbe||Approved|70926|40ac7117-c7e2-42ea-b34f-96330c9380b6|BEL-FSP-Users|BEL-FSP-Users|||false|None|in office meal #1|||Personal|Cash|1ee44666-f4c7-44b3-acd3-8ecd7127480a|Cash|2cb4ccb7-634d-4386-af43-b4572ec72098|00AA06|00AA06|true||6c5a835f-5152-46db-923a-3ebd08c7dad3|null|null|XPD012245802||1820711.xml|2018-08-07 05:42:10.46|EUR|30.00
Please help.
Snaplogic Python Script
from com.snaplogic.scripting.language import ScriptHook
from com.snaplogic.scripting.language.ScriptHook import *
import csv
class TransformScript(ScriptHook):
def __init__(self, input, output, error, log):
self.input = input
self.output = output
self.error = error
self.log = log
def execute(self):
self.log.info("Executing Transform script")
while self.input.hasNext():
data = self.input.next()
branch_currency = data['BranchCurrency']
calc_currency = data['CalculatedCurrency'].split(',')
calc_currency_amount = data['CalculatedCurrencyAmount'].split(',')
result = None
for i, name in enumerate(calc_currency):
result = calc_currency_amount[i] if name == branch_currency else result
data["CalculatedCurrencyAmount"] = result
result1 = calc_currency[i] if name == branch_currency else result
data["CalculatedCurrency"] = result1
try:
data["mathTryCatch"] = data["counter2"].longValue() + 33
self.output.write(data)
except Exception as e:
data["errorMessage"] = e.message
self.error.write(data)
self.log.info("Finished executing the Transform script")
hook = TransformScript(input, output, error, log)
Upvotes: 0
Views: 364
Reputation: 4131
You don't need to use the script snap here at all. Writing scripts for transformations all the time hampers the performance and defeats the purpose of an IPaaS tool altogether. The mapper should suffice.
I created the following test pipeline for this problem.
I saved the data provided in this question in a file and saved it in SnapLogic for the test. In the pipeline, I parsed it using a CSV parser.
Following is the parsed result.
Then I used a mapper for doing the required transformation.
Following is the expression for getting the actual amount.
$CalculatedCurrency.split(',').indexOf($BranchCurrency) >= 0 ? $CalculatedCurrencyAmount.split(',')[$CalculatedCurrency.split(',').indexOf($BranchCurrency)] : null
Following is the result.
Avoid writing scripts for problems that can be solved using mappers.
Upvotes: 0
Reputation: 13249
Using awk:
awk 'BEGIN{FS=OFS="|"}
NR==1{print $0,"ActualCurrency","ActualAmount";next}
{n=split($9,a,",");split($10,b,",");for(i=1;i<=n;i++) if(a[i]==$5) print $0,$5,b[i]}' file
BEGIN{FS=OFS="|"}
sets the input and ouput delimiter to |
.
The NR==1
statement takes care of the header by adding the 2 strings.
The 9th and 10th fields are splitted based on the ,
separator, and values are set inside the arrays a
and b
.
The for
loop is trying to find the value of the a
array corresponding to the 5th field. If found, the corresponding value of the b
is printed.
Upvotes: 0
Reputation: 7716
I know, the op asked for unix shell, but as an alternative option I show some code to do it using python. (Obviously this code can be heavily improved also.) The great advantage is readability, for instance, that you can address your data via name. Or the code, which is much more readable at all, than doing this with awk et al.
Save your data in data.psv
, write the following script into a file main.py
. I've tested it using python3 and python2. Both works. Run the script using python main.py
.
Update: I've extended the script to parse all lines. In the example data, I've set BranchCurrency to EUR in the first line and USD in the secondline, as a dummy test.
File: main.py
import csv
def parse_line(row):
branch_currency = row['BranchCurrency']
calc_currency = row['CalculatedCurrency'].split(',')
calc_currency_amount = row['CalculatedCurrencyAmount'].split(',')
result = None
for i, name in enumerate(calc_currency):
result = calc_currency_amount[i] if name == branch_currency else result
return result
def main():
with open('data.psv') as f:
reader = csv.DictReader(f, delimiter='|')
for row in reader:
print(parse_line(row))
if __name__ == '__main__':
main()
Example Data:
[:~] $ cat data.psv
AccountancyNumber|AccountancyNumberExtra|Amount|ApprovedBy|BranchCurrency|BranchGuid|BranchId|BranchName|CalculatedCurrency|CalculatedCurrencyAmount|CalculatedCurrencyVatAmount|ControllerBy|Country|Currency|CustomFieldEnabled|CustomFieldGuid|CustomFieldName|CustomFieldRequired|CustomFieldValue|Date|DateApproved|DateControlled|Email|EnterpriseNumber|ExpenseAccountGuid|ExpenseAccountName|ExpenseAccountStatus|ExpenseGuid|ExpenseReason|ExpenseStatus|ExternalId|GroupGuid|GroupId|GroupName|IBAN|Image|IsInvoice|MatchStatus|Merchant|MerchantEnterpriseNumber|Note|OwnerShip|PaymentMethod|PaymentMethodGuid|PaymentMethodName|ProjectGuid|ProjectId|ProjectName|Reimbursable|TravellerId|UserGUID|VatAmount|VatPercentage|XpdReference|VatCode|FileName|CreateTstamp
61470003||35.00|null|EUR|168fcea9-17d4-45a1-8b6f-bfb249cdbea6|BEL|BEL|USD,INR,EUR|35.20,2420.11,30.00|null,null,null|null|BE|EUR|true|0d4b767b-0988-47e8-9144-05e607169284|careertitle|false|FE|2018-07-24T00:00:00|null|null|[email protected]||c32f03c6-31df-4fd8-8cc2-1c5f3a580aad|Meals - In Office|true|781d10d2-2f3b-43bc-866e-a653fefacbbe||Approved|70926|40ac7117-c7e2-42ea-b34f-96330c9380b6|BEL-FSP-Users|BEL-FSP-Users|||false|None|in office meal #1|||Personal|Cash|1ee44666-f4c7-44b3-acd3-8ecd7127480a|Cash|2cb4ccb7-634d-4386-af43-b4572ec72098|00AA06|00AA06|true||6c5a835f-5152-46db-923a-3ebd08c7dad3|null|null|XPD012245802||1820711.xml|2018-08-07 05:42:10.46
61470003||35.00|null|USD|168fcea9-17d4-45a1-8b6f-bfb249cdbea6|BEL|BEL|USD,INR,EUR|35.20,2420.11,30.00|null,null,null|null|BE|EUR|true|0d4b767b-0988-47e8-9144-05e607169284|careertitle|false|FE|2018-07-24T00:00:00|null|null|[email protected]||c32f03c6-31df-4fd8-8cc2-1c5f3a580aad|Meals - In Office|true|781d10d2-2f3b-43bc-866e-a653fefacbbe||Approved|70926|40ac7117-c7e2-42ea-b34f-96330c9380b6|BEL-FSP-Users|BEL-FSP-Users|||false|None|in office meal #1|||Personal|Cash|1ee44666-f4c7-44b3-acd3-8ecd7127480a|Cash|2cb4ccb7-634d-4386-af43-b4572ec72098|00AA06|00AA06|true||6c5a835f-5152-46db-923a-3ebd08c7dad3|null|null|XPD012245802||1820711.xml|2018-08-07 05:42:10.46
Example Run:
[:~] $ python main.py
30.00
35.20
Upvotes: 0
Reputation: 141010
Using bash with some arrays:
arr_find() {
echo $(( $(printf "%s\0" "${@:2}" | grep -Fnxz "$1" | cut -d: -f1) - 1 ))
}
IFS='|' read -r -a headers
while IFS='|' read -r "${headers[@]}"; do
IFS=',' read -r -a CalculatedCurrency <<<"$CalculatedCurrency"
IFS=',' read -r -a CalculatedCurrencyAmount <<<"$CalculatedCurrencyAmount"
idx=$(arr_find "$BranchCurrency" "${CalculatedCurrency[@]}")
echo "BranchCurrency is $BranchCurrency. Hence CalculatedCurrency will be ${CalculatedCurrency[$idx]} and CalculatedCurrencyAmount will have to be ${CalculatedCurrencyAmount[$idx]}."
done
First I read all headers names. Then read all values into headers. Then read CalculatedCurrency* correctly, cause they are separated by ','. Then I find the element number which is equal to BranchCurrency inside CalculatedCurrency. Having the element index and arrays, I can just print the output.
Upvotes: 0