asha
asha

Reputation: 101

Parsing a file with bash script

I have a file with multiple lines which is structured as seen below

MSH|^~\&|Xatidok|V10.0.2.000|OSestra|x-tention|201203060855||ADT^A03|2914|P|2.3^AA&BB
EVN|A03|201203060855|201203060855|01|Fidani
PID|||00019380|2012049008^120005548^302830|PATIDOK-person^InRid^|Rudi|19111111|F|||Rose |A|Pens.
NK1||IRergrun^RROSlf^||Rose ^^Wels^^4600^A|07242123123|||||||||||||||||||||||||||||||
PV1||I|1212^G442^G442-||0|||||||||||2012049008|General|||||||||||||||||||12|||||201202060927|||||||

So basically there are rows with data on it seperated with pipes (|) and i want to parse them by writing a bash script.

So briefly this is the structure

The idea of running the sript is: ./script.sh filename command

command should look like: MSH.2.3.4 or shorter

Meaning: Access the field which starts with MSH, Field number 2, Component number 3, Sub component 4

So my logic of parsing is as follows: I want to create an array which stores every row (segment) from the file as follows:

#!/bin/bash

file_to_be_parsed=$1
command=$2
counter=0

#read the file and split it into lines (segments) by creating an array called segments which holds all the lines (segment) in it
#array segments[] holds every line/segment of the file indexed from 0 to X

while IFS= read -a segment; do
     segments[$counter]=$segment
     counter=$((counter+1)); 
done < $file_to_be_parsed

SECOND: My second step is to seperate each array member one step further based on the delimiter and i can do it by:

IFS="|" read -r field <<< (here i can't figure out)

but i can't actually create 2D array in bash even though I searched a lot. Then i can access the specific fields ...

So can someone help me how to further seperate these array members into fields ...

Upvotes: 4

Views: 9019

Answers (2)

dash-o
dash-o

Reputation: 14422

Fr puer bash-only solution, can use bash arrays to split the line into fields, components, sub components. Provided that you do not have to run the code on large data sets, should be OK.

Considers switching to more powerful engine (awk, python, perl) for large problems.

#! /bin/bash
file=$1
command=$2
   # Split command into key, so that items are key[0], key[1], ...
IFS="." read -a k <<<"$command"

  # Look for matching line to k[0]
while IFS='|' read -a fa ; do
  # Skip to next row if no match.
  [ "${fa[0]}" = "${k[0]}" ] || continue ;
  # Field
  v=${fa[${k[1]}-1]}
  # Component
  if [ "${#k[@]}" -gt 2 ] ; then
      IFS="^" read -a fb <<<"$v"
      v=${fb[${k[2]}-1]}
  fi
  # Sub component
  if [ "${#k[@]}" -gt 3 ] ; then
      IFS="&" read -a fc <<<"$v"
      v=${fc[${k[3]}-1]}
  fi
  echo "V=$v" ;
  break
done <"$file"

Upvotes: 2

Dudi Boy
Dudi Boy

Reputation: 4865

This is a classic awk (standard Linux gawk) problem.

Here is a simple script that verify input arguments and parse only the required fields, component and subComponent using awk's internal split function.

The user is encouraged to simplify the script output layouts.

As for script's arguments, all are mandatory (some might be ignored), the input.txt file must be last.

input.txt

MSH|^~\&|Xatidok|V10.0.2.000|OSestra|x-tention|201203060855||ADT^A03|2914|P|2.3^AA&BB
EVN|A03|201203060855|201203060855|01|Fidani
PID|||00019380|2012049008^120005548^302830|PATIDOK-person^InRid^|Rudi|19111111|F|||Rose |A|Pens.
NK1||IRergrun^RROSlf^||Rose ^^Wels^^4600^A|07242123123|||||||||||||||||||||||||||||||
PV1||I|1212^G442^G442-||0|||||||||||2012049008|General|||||||||||||||||||12|||||201202060927|||||||

script.awk

BEGIN {FS="|"; componentSeperator="^"; subComponentSeperator="&"}
function readArgs() {
     if (passedReadArgs == 1) return;
     if (length(field) == 0) {print "Missing field string argument, exiting."; exit;}
     if (length(fieldNumber) == 0) {print "Missing fieldNumber number argument, exiting."; exit;}
     if (length(componentNumber) == 0) {print "Missing componentNumber number argument, exiting."; exit;}
     if (length(subComponentNumber) == 0) {print "Missing subComponentNumber number argument, exiting."; exit;}
     passedReadArgs = 1;
}
{
     readArgs();
     if ($0 !~ field) next;

     print "Arguments: "field, fieldNumber, componentNumber, subComponentNumber;

     print "field["fieldNumber"] = "$fieldNumber;

     split($fieldNumber, componentsArr, componentSeperator);
     if (length(componentsArr[componentNumber]) > 0) {
          print "component["componentNumber"] = "componentsArr[componentNumber];
          split(componentsArr[componentNumber], subComponentsArr, subComponentSeperator);
          if (length(subComponentsArr[subComponentNumber]) > 0) print "subComponent["subComponentNumber"] = "subComponentsArr[subComponentNumber];
     }
}

running the script.awk script:

awk -f script.awk field="MSH" fieldNumber=11 componentNumber=2 subComponentNumber=2 input.txt

output:

Arguments: MSH 12 2 2
field[12] = 2.3^AA&BB
component[2] = AA&BB
subComponent[2] = BB

Arguments: NK1 5 3 2
field[5] = Rose ^^Wels^^4600^A
component[3] = Wels


Arguments: PID 7 3 2
field[7] = Rudi

Upvotes: 4

Related Questions