IMTheNachoMan
IMTheNachoMan

Reputation: 5833

read file and extract variables based on what is in the line

I have a file that looks like this:

$ cat file_test
garbage text A=one B=two C=three D=four
garbage text A= B=six D=seven
garbage text A=eight E=nine D=ten B=eleven

I want to go through each line and extract specific "variables" to use in the loop. And if a line doesn't have a variable then set it to an empty string.

So, for the above example, lets say I want to extract the variables A, B, and C, then for each line, the loop would have this:

  1. garbage text A=one B=two C=three D=four
    • A = "one"
    • B = "two"
    • C = "three"
  2. garbage text A= B=six D=seven
    • A = ""
    • B = "six"
    • C = ""
  3. garbage text A=eight E=nine D=ten B=eleven
    • A = "eight"
    • B = "eleven"
    • C = ""

My original plan was to use sed but that won't work since the order of the "variables" is not consistent (the last line for example) and a "variable" may be missing (the second line for example).

My next thought is to go through line by line, then split the line into fields using awk and set variables based on each field but I have no clue where or how to start.

I'm open to other ideas or better suggestions.

Upvotes: 0

Views: 649

Answers (7)

Ed Morton
Ed Morton

Reputation: 204164

Its unclear whether you're trying to set awk variables or shell variables but here's how to populate an associative awk array and then use that to populate an associative shell array:

$ cat tst.awk
BEGIN {
    numKeys = split("A B C",keys)
}
{
    delete f
    for (i=1; i<=NF; i++) {
        if ( split($i,t,/=/) == 2 ) {
            f[t[1]] = t[2]
        }
    }
    for (keyNr=1; keyNr<=numKeys; keyNr++) {
        key = keys[keyNr]
        printf "[%s]=\"%s\"%s", key, f[key], (keyNr<numKeys ? OFS : ORS)
    }
}

$ awk -f tst.awk file
[A]="one" [B]="two" [C]="three"
[A]="" [B]="six" [C]=""
[A]="eight" [B]="eleven" [C]=""

$  while IFS= read -r out; do declare -A arr="( $out )"; declare -p arr; done < <(awk -f tst.awk file)
declare -A arr=([A]="one" [B]="two" [C]="three" )
declare -A arr=([A]="" [B]="six" [C]="" )
declare -A arr=([A]="eight" [B]="eleven" [C]="" )

$ echo "${arr["A"]}"
eight

Upvotes: 0

NeronLeVelu
NeronLeVelu

Reputation: 10039

a generic variable awk seld documented. Assuming variable separator are = and not part of text before nor variable content itself.

awk 'BEGIN {
        # load the list of variable and order to print
        VarSize = split( "A B C", aIdx )
        # create a pattern filter for variable catch in lines
        for ( Idx in aIdx ) VarEntry = ( VarEntry ? ( VarEntry "|^" ) : "^" ) aIdx[Idx] "="
        }

        {
        # reset varaible value
        split( "", aVar )
        # for each part of the line
        for ( Fld=1; Fld<=NF; Fld++ ) {
           # if part is a varaible assignation
           if( $Fld ~ VarEntry ) {
              # separate variable name and content in array
              split( $Fld, aTemp, /=/ )
              # put variable content in corresponding varaible name container
              aVar[aTemp[1]] = aTemp[2]
              }
           }
        # print all variable content (empty or not) found on this line
        for ( Idx in aIdx ) printf( "%s = \042%s\042\n", aIdx[Idx], aVar[aIdx[Idx]] )
        }
      ' YourFile

Upvotes: 0

stack0114106
stack0114106

Reputation: 8711

Another Perl

perl -lne ' %x = /(\S+)=(\S+)/g ; for("A","B","C") { print "$_ = $x{$_}" } %x=() '

with the input file

$ perl -lne ' %x = /(\S+)=(\S+)/g ; for("A","B","C") { print "$_ = $x{$_}" } %x=() ' file_test
A = one
B = two
C = three
A =
B = six
C =
A = eight
B = eleven
C =
$

Upvotes: 0

RavinderSingh13
RavinderSingh13

Reputation: 133650

On my first 3 solutions, I am considering that your need to use shell variables from the values of strings A,B,C and you do not want to simply print them, if this is the case then following(s) may help you.



1st Solution: It considers that your variables A,B,C are always coming in same field number.

while read first second third fourth fifth sixth
do
  echo $third,$fourth,$fifth        ##Printing values here.
  a_var=${third#*=}
  b_var=${fourth#*=}
  c_var=${fifth#*=}
  echo "Using new values of variables here...."
  echo "NEW A="$a_var
  echo "NEW B="$b_var
  echo "NEW C="$c_var
done < "Input_file"

It is simply printing the variables values in each line since you have NOT told what use you are going to do with these variables so I am simply printing them you could use them as per your use case too.



2nd solution: This considers that variables are coming in same order but it does check if A is coming on 3rd place or not, B is coming on 4th place or not etc and prints accordingly.

while read first second third fourth fifth sixth
do
  echo $third,$fourth,$fifth        ##Printing values here.
  a_var=$(echo "$third" | awk '$0 ~ /^A/{sub(/.*=/,"");print}')
  b_var=$(echo "$fourth" | awk '$0 ~ /^B/{sub(/.*=/,"");print}')
  c_var=$(echo "$fifth" | awk '$0 ~ /^C/{sub(/.*=/,"");print}')
  echo "Using new values of variables here...."
  echo "NEW A="$a_var
  echo "NEW B="$b_var
  echo "NEW C="$c_var
done < "Input_file"


3rd Solution: Which looks perfect FIT for your requirement, not sure how much efficient from coding vice(I am still analyzing more if we could do something else here too). This code will NOT look for A,B, or C's order in line it will match it let them be anywhere in line, if match found it will assign value of variable OR else it will be NULL value.

while read line
do
  a_var=$(echo "$line" | awk 'match($0,/A=[^ ]*/){val=substr($0,RSTART,RLENGTH);sub(/.*=/,"",val);print val}')
  b_var=$(echo "$line" | awk 'match($0,/B=[^ ]*/){val=substr($0,RSTART,RLENGTH);sub(/.*=/,"",val);print val}')
  c_var=$(echo "$line" | awk 'match($0,/C=[^ ]*/){val=substr($0,RSTART,RLENGTH);sub(/.*=/,"",val);print val}')
  echo "Using new values of variables here...."
  echo "NEW A="$a_var
  echo "NEW B="$b_var
  echo "NEW C="$c_var
done < "Input_file

Output will be as follows.

Using new values of variables here....
NEW A=one
NEW B=two
NEW C=three
Using new values of variables here....
NEW A=
NEW B=six
NEW C=
Using new values of variables here....
NEW A=eight
NEW B=eleven
NEW C=


EDIT1: In case you simply want to print values of A,B,C then try following.

awk '{
 for(i=1;i<=NF;i++){
   if($i ~ /[ABCabc]=/){
     sub(/.*=/,"",$i)
     a[++count]=$i
   }
 }
 print "A="a[1] ORS "B=" a[2] ORS "C="a[3];count=""
 delete a
}'  Input_file

Upvotes: 0

karakfa
karakfa

Reputation: 67507

right answer depends on what you're going to do with the variables.

assuming you need them as shell variables, here is a different approach

$ while IFS= read -r line; 
  do A=""; B=""; C=""; 
     source <(echo "$line" | grep -oP "(A|B|C)=\w*" ); 
     echo "A=$A B=$B C=$C"; 
  done < file

A=one B=two C=three
A= B=six C=
A=eight B=eleven C=

the trick is using source for variable declarations extracted from each line with grep. Since value assignments carry over, you need to reset them before each new line.

Upvotes: 1

David C. Rankin
David C. Rankin

Reputation: 84579

I'm partial to the awk solution, e.g.

$ awk '{for (i = 1; i <= NF; i++) if ($i ~ /^[A-Za-z_][^=]*[=]/) print $i}' file
A=one
B=two
C=three
D=four
A=
B=six
D=seven
A=eight
E=nine
D=ten
B=eleven

Explanation

  • for (i = 1; i <= NF; i++) loop over each space separated field;
  • if ($i ~ /^[A-Za-z_][^=]*[=]/) if the field begins with at least one character that is [A-Za-z_] followed by an '='; then
  • print $i print the field.

Upvotes: 0

tshiono
tshiono

Reputation: 22042

If perl is your option, please try:

perl -ne 'undef %a; while (/([\w]+)=([\w]*)/g) {$a{$1}=$2;}
    for ("A", "B", "C") {print "$_=\"$a{$_}\"\n";}' file_test

Output:

A="one"
B="two"
C="three"
A=""
B="six"
C=""
A="eight"
B="eleven"
C=""

It parses each line for assignments with =, store the key-value pair in an assoc array %a, then finally reports the values for A, B and C.

Upvotes: 0

Related Questions