Reputation: 5833
I have a file that looks like this:
$ cat file_test
garbage text A=one B=two C=three D=four
garbage text A= B=six D=seven
garbage text A=eight E=nine D=ten B=eleven
I want to go through each line and extract specific "variables" to use in the loop. And if a line doesn't have a variable then set it to an empty string.
So, for the above example, lets say I want to extract the variables A
, B
, and C
, then for each line, the loop would have this:
garbage text A=one B=two C=three D=four
A
= "one"B
= "two"C
= "three"garbage text A= B=six D=seven
A
= ""B
= "six"C
= ""garbage text A=eight E=nine D=ten B=eleven
A
= "eight"B
= "eleven"C
= ""My original plan was to use sed
but that won't work since the order of the "variables" is not consistent (the last line for example) and a "variable" may be missing (the second line for example).
My next thought is to go through line by line, then split the line into fields using awk
and set variables based on each field but I have no clue where or how to start.
I'm open to other ideas or better suggestions.
Upvotes: 0
Views: 649
Reputation: 204164
Its unclear whether you're trying to set awk variables or shell variables but here's how to populate an associative awk array and then use that to populate an associative shell array:
$ cat tst.awk
BEGIN {
numKeys = split("A B C",keys)
}
{
delete f
for (i=1; i<=NF; i++) {
if ( split($i,t,/=/) == 2 ) {
f[t[1]] = t[2]
}
}
for (keyNr=1; keyNr<=numKeys; keyNr++) {
key = keys[keyNr]
printf "[%s]=\"%s\"%s", key, f[key], (keyNr<numKeys ? OFS : ORS)
}
}
$ awk -f tst.awk file
[A]="one" [B]="two" [C]="three"
[A]="" [B]="six" [C]=""
[A]="eight" [B]="eleven" [C]=""
$ while IFS= read -r out; do declare -A arr="( $out )"; declare -p arr; done < <(awk -f tst.awk file)
declare -A arr=([A]="one" [B]="two" [C]="three" )
declare -A arr=([A]="" [B]="six" [C]="" )
declare -A arr=([A]="eight" [B]="eleven" [C]="" )
$ echo "${arr["A"]}"
eight
Upvotes: 0
Reputation: 10039
a generic variable awk seld documented.
Assuming variable separator are =
and not part of text before nor variable content itself.
awk 'BEGIN {
# load the list of variable and order to print
VarSize = split( "A B C", aIdx )
# create a pattern filter for variable catch in lines
for ( Idx in aIdx ) VarEntry = ( VarEntry ? ( VarEntry "|^" ) : "^" ) aIdx[Idx] "="
}
{
# reset varaible value
split( "", aVar )
# for each part of the line
for ( Fld=1; Fld<=NF; Fld++ ) {
# if part is a varaible assignation
if( $Fld ~ VarEntry ) {
# separate variable name and content in array
split( $Fld, aTemp, /=/ )
# put variable content in corresponding varaible name container
aVar[aTemp[1]] = aTemp[2]
}
}
# print all variable content (empty or not) found on this line
for ( Idx in aIdx ) printf( "%s = \042%s\042\n", aIdx[Idx], aVar[aIdx[Idx]] )
}
' YourFile
Upvotes: 0
Reputation: 8711
Another Perl
perl -lne ' %x = /(\S+)=(\S+)/g ; for("A","B","C") { print "$_ = $x{$_}" } %x=() '
with the input file
$ perl -lne ' %x = /(\S+)=(\S+)/g ; for("A","B","C") { print "$_ = $x{$_}" } %x=() ' file_test
A = one
B = two
C = three
A =
B = six
C =
A = eight
B = eleven
C =
$
Upvotes: 0
Reputation: 133650
On my first 3 solutions, I am considering that your need to use shell variables from the values of strings A,B,C
and you do not want to simply print them, if this is the case then following(s) may help you.
1st Solution: It considers that your variables A,B,C
are always coming in same field number.
while read first second third fourth fifth sixth
do
echo $third,$fourth,$fifth ##Printing values here.
a_var=${third#*=}
b_var=${fourth#*=}
c_var=${fifth#*=}
echo "Using new values of variables here...."
echo "NEW A="$a_var
echo "NEW B="$b_var
echo "NEW C="$c_var
done < "Input_file"
It is simply printing the variables values in each line since you have NOT told what use you are going to do with these variables so I am simply printing them you could use them as per your use case too.
2nd solution: This considers that variables are coming in same order but it does check if A is coming on 3rd place or not, B is coming on 4th place or not etc and prints accordingly.
while read first second third fourth fifth sixth
do
echo $third,$fourth,$fifth ##Printing values here.
a_var=$(echo "$third" | awk '$0 ~ /^A/{sub(/.*=/,"");print}')
b_var=$(echo "$fourth" | awk '$0 ~ /^B/{sub(/.*=/,"");print}')
c_var=$(echo "$fifth" | awk '$0 ~ /^C/{sub(/.*=/,"");print}')
echo "Using new values of variables here...."
echo "NEW A="$a_var
echo "NEW B="$b_var
echo "NEW C="$c_var
done < "Input_file"
3rd Solution: Which looks perfect FIT for your requirement, not sure how much efficient from coding vice(I am still analyzing more if we could do something else here too). This code will NOT look for A
,B
, or C
's order in line it will match it let them be anywhere in line, if match found it will assign value of variable OR else it will be NULL value.
while read line
do
a_var=$(echo "$line" | awk 'match($0,/A=[^ ]*/){val=substr($0,RSTART,RLENGTH);sub(/.*=/,"",val);print val}')
b_var=$(echo "$line" | awk 'match($0,/B=[^ ]*/){val=substr($0,RSTART,RLENGTH);sub(/.*=/,"",val);print val}')
c_var=$(echo "$line" | awk 'match($0,/C=[^ ]*/){val=substr($0,RSTART,RLENGTH);sub(/.*=/,"",val);print val}')
echo "Using new values of variables here...."
echo "NEW A="$a_var
echo "NEW B="$b_var
echo "NEW C="$c_var
done < "Input_file
Output will be as follows.
Using new values of variables here....
NEW A=one
NEW B=two
NEW C=three
Using new values of variables here....
NEW A=
NEW B=six
NEW C=
Using new values of variables here....
NEW A=eight
NEW B=eleven
NEW C=
EDIT1: In case you simply want to print values of A,B,C
then try following.
awk '{
for(i=1;i<=NF;i++){
if($i ~ /[ABCabc]=/){
sub(/.*=/,"",$i)
a[++count]=$i
}
}
print "A="a[1] ORS "B=" a[2] ORS "C="a[3];count=""
delete a
}' Input_file
Upvotes: 0
Reputation: 67507
right answer depends on what you're going to do with the variables.
assuming you need them as shell variables, here is a different approach
$ while IFS= read -r line;
do A=""; B=""; C="";
source <(echo "$line" | grep -oP "(A|B|C)=\w*" );
echo "A=$A B=$B C=$C";
done < file
A=one B=two C=three
A= B=six C=
A=eight B=eleven C=
the trick is using source
for variable declarations extracted from each line with grep
. Since value assignments carry over, you need to reset them before each new line.
Upvotes: 1
Reputation: 84579
I'm partial to the awk
solution, e.g.
$ awk '{for (i = 1; i <= NF; i++) if ($i ~ /^[A-Za-z_][^=]*[=]/) print $i}' file
A=one
B=two
C=three
D=four
A=
B=six
D=seven
A=eight
E=nine
D=ten
B=eleven
Explanation
for (i = 1; i <= NF; i++)
loop over each space separated field;if ($i ~ /^[A-Za-z_][^=]*[=]/)
if the field begins with at least one character that is [A-Za-z_]
followed by an '='
; thenprint $i
print the field.Upvotes: 0
Reputation: 22042
If perl
is your option, please try:
perl -ne 'undef %a; while (/([\w]+)=([\w]*)/g) {$a{$1}=$2;}
for ("A", "B", "C") {print "$_=\"$a{$_}\"\n";}' file_test
Output:
A="one"
B="two"
C="three"
A=""
B="six"
C=""
A="eight"
B="eleven"
C=""
It parses each line for assignments with =
, store the key-value pair in an assoc array %a
, then finally reports the values for A, B and C.
Upvotes: 0