Reputation: 101
I have a file with multiple lines which is structured as seen below
MSH|^~\&|Xatidok|V10.0.2.000|OSestra|x-tention|201203060855||ADT^A03|2914|P|2.3^AA&BB
EVN|A03|201203060855|201203060855|01|Fidani
PID|||00019380|2012049008^120005548^302830|PATIDOK-person^InRid^|Rudi|19111111|F|||Rose |A|Pens.
NK1||IRergrun^RROSlf^||Rose ^^Wels^^4600^A|07242123123|||||||||||||||||||||||||||||||
PV1||I|1212^G442^G442-||0|||||||||||2012049008|General|||||||||||||||||||12|||||201202060927|||||||
So basically there are rows with data on it seperated with pipes (|) and i want to parse them by writing a bash script.
So briefly this is the structure
The idea of running the sript is: ./script.sh filename command
command should look like: MSH.2.3.4 or shorter
Meaning: Access the field which starts with MSH, Field number 2, Component number 3, Sub component 4
So my logic of parsing is as follows: I want to create an array which stores every row (segment) from the file as follows:
#!/bin/bash
file_to_be_parsed=$1
command=$2
counter=0
#read the file and split it into lines (segments) by creating an array called segments which holds all the lines (segment) in it
#array segments[] holds every line/segment of the file indexed from 0 to X
while IFS= read -a segment; do
segments[$counter]=$segment
counter=$((counter+1));
done < $file_to_be_parsed
SECOND: My second step is to seperate each array member one step further based on the delimiter and i can do it by:
IFS="|" read -r field <<< (here i can't figure out)
but i can't actually create 2D array in bash even though I searched a lot. Then i can access the specific fields ...
So can someone help me how to further seperate these array members into fields ...
Upvotes: 4
Views: 9019
Reputation: 14422
Fr puer bash-only solution, can use bash arrays to split the line into fields, components, sub components. Provided that you do not have to run the code on large data sets, should be OK.
Considers switching to more powerful engine (awk, python, perl) for large problems.
#! /bin/bash
file=$1
command=$2
# Split command into key, so that items are key[0], key[1], ...
IFS="." read -a k <<<"$command"
# Look for matching line to k[0]
while IFS='|' read -a fa ; do
# Skip to next row if no match.
[ "${fa[0]}" = "${k[0]}" ] || continue ;
# Field
v=${fa[${k[1]}-1]}
# Component
if [ "${#k[@]}" -gt 2 ] ; then
IFS="^" read -a fb <<<"$v"
v=${fb[${k[2]}-1]}
fi
# Sub component
if [ "${#k[@]}" -gt 3 ] ; then
IFS="&" read -a fc <<<"$v"
v=${fc[${k[3]}-1]}
fi
echo "V=$v" ;
break
done <"$file"
Upvotes: 2
Reputation: 4865
This is a classic awk
(standard Linux gawk) problem.
Here is a simple script that verify input arguments and parse only the required fields, component and subComponent using awk
's internal split
function.
The user is encouraged to simplify the script output layouts.
As for script's arguments, all are mandatory (some might be ignored), the input.txt file must be last.
input.txt
MSH|^~\&|Xatidok|V10.0.2.000|OSestra|x-tention|201203060855||ADT^A03|2914|P|2.3^AA&BB
EVN|A03|201203060855|201203060855|01|Fidani
PID|||00019380|2012049008^120005548^302830|PATIDOK-person^InRid^|Rudi|19111111|F|||Rose |A|Pens.
NK1||IRergrun^RROSlf^||Rose ^^Wels^^4600^A|07242123123|||||||||||||||||||||||||||||||
PV1||I|1212^G442^G442-||0|||||||||||2012049008|General|||||||||||||||||||12|||||201202060927|||||||
script.awk
BEGIN {FS="|"; componentSeperator="^"; subComponentSeperator="&"}
function readArgs() {
if (passedReadArgs == 1) return;
if (length(field) == 0) {print "Missing field string argument, exiting."; exit;}
if (length(fieldNumber) == 0) {print "Missing fieldNumber number argument, exiting."; exit;}
if (length(componentNumber) == 0) {print "Missing componentNumber number argument, exiting."; exit;}
if (length(subComponentNumber) == 0) {print "Missing subComponentNumber number argument, exiting."; exit;}
passedReadArgs = 1;
}
{
readArgs();
if ($0 !~ field) next;
print "Arguments: "field, fieldNumber, componentNumber, subComponentNumber;
print "field["fieldNumber"] = "$fieldNumber;
split($fieldNumber, componentsArr, componentSeperator);
if (length(componentsArr[componentNumber]) > 0) {
print "component["componentNumber"] = "componentsArr[componentNumber];
split(componentsArr[componentNumber], subComponentsArr, subComponentSeperator);
if (length(subComponentsArr[subComponentNumber]) > 0) print "subComponent["subComponentNumber"] = "subComponentsArr[subComponentNumber];
}
}
script.awk
script:awk -f script.awk field="MSH" fieldNumber=11 componentNumber=2 subComponentNumber=2 input.txt
Arguments: MSH 12 2 2
field[12] = 2.3^AA&BB
component[2] = AA&BB
subComponent[2] = BB
Arguments: NK1 5 3 2
field[5] = Rose ^^Wels^^4600^A
component[3] = Wels
Arguments: PID 7 3 2
field[7] = Rudi
Upvotes: 4