dhokas
dhokas

Reputation: 1789

apply dictionary mapping to the column of a file with awk

I have a text file file.txt with several columns (tab separated), and the first column can contain indexes such as 1, 2, and 3. I want to update the first column so that 1 becomes "one", 2 becomes "two", and 3 becomes "three". I created a bash file a.sh containing:

declare -A DICO=( [1]="one" [2]="two" [3]="three" )
awk '{ $1 = ${DICO[$1]}; print }'

But now when I run cat file.txt | ./a.sh I get:

awk: cmd. line:1: { $1 = ${DICO[$1]}; print }
awk: cmd. line:1:         ^ syntax error

I'm not able to fix the syntax. Any ideas? Also there is maybe a better way to do this with bash, but I could not think of another simple approach.

For instance, if the input is a file containing:

2       xxx
2       yyy
1       zzz
3       000
4       bla

The expected output would be:

two     xxx
two     yyy
one     zzz
three   000
UNKNOWN bla

Upvotes: 1

Views: 1530

Answers (2)

RavinderSingh13
RavinderSingh13

Reputation: 133528

EDIT: Since OP had now added samples so changed solution as per that now.

awk 'BEGIN{split("one,two,three",array,",")} {$1=$1 in array?array[$1]:"UNKONW"} 1' OFS="\t" Input_file

Explanation: Adding explanation for above code too now.

awk '
BEGIN{                              ##Starting BEGIN block of awk code here.
  split("one,two,three",array,",")  ##Creating an array named array whose values are string one two three with delimiter as comma.
}
{
  $1=$1 in array?array[$1]:"UNKOWN" ##Re-creating first column which will be if $1 comes in array then its value will be aray[$1] else it will be UNKOWN string.
}
1                                   ##Mentioning 1 here. awk works on method of condition then action, so making condition is TRUE here and not mentioning any action so by default print of current line will happen.
' Input_file                        ##mentioning Input_file name here.

Since you haven't shown samples so couldn't tested completely, could you please try following and let me know if this helps.

awk 'function check(value){gsub(value,array[value],$1)} BEGIN{split("one,two,three",array,",")} check(1) check(2) check(3); 1' Input_file

Adding a non-one liner form of solution too here.

awk '
function check(value){
  gsub(value,array[value],$1)
}
BEGIN{
  split("one,two,three",array,",")
}
check(1)
check(2)
check(3);
1'  OFS="\t" Input_file

Tested code as follows too:

Let's say we have following Input_file:

cat Input_file
1213121312111122243434onetwothree wguwvrwvrwvbvrwvrvr
vkewjvrkmvr13232424

Then after running the code following will be the output:

onetwoonethreeonetwoonethreeonetwooneoneoneonetwotwotwo4three4three4onetwothree wguwvrwvrwvbvrwvrvr
vkewjvrkmvronethreetwothreetwo4two4

Upvotes: 1

oliv
oliv

Reputation: 13249

Given a dico file containing this:

$ cat dico
1 one
2 two
3 three

You could use this awk script:

awk 'NR==FNR{a[$1]=$2;next}($1 in a){$1=a[$1]}1' dico file.txt

This fills the array a with the content of the dico file and replaces the first element of the file.txt file if this one is part of the array.

Upvotes: 1

Related Questions