How to assign awk result variable to an array and is it possible to use awk inside another awk in loop

Question

I've started to learn bash and totally stuck with the task. I have a comma separated csv file with records like:

id,location_id,organization_id,service_id,name,title,email,department
1,1,,,Name surname,department1 department2 department3,,
2,1,,,name Surname,department1,,
3,2,,,Name Surname,"department1 department2, department3",, e.t.c.

I need to format it this way: name and surname must start with a capital letter

add an email record that consists of the first letter of the name and full surname in lowercase
create a new csv with records from the old csv with corrected fields.

I split csv on records using awk ( cause some fields contain fields with a comma between quotes "department1 department2, department3" ).

#!/bin/bash
input="$HOME/test.csv"

exec 0<$input

while read line; do

awk -v FPAT='"[^"]*"|[^,]*' '{ 
  ...
}' $input)

done

inside awk {...} (NF=8 for each record), I tried to use certain field values ($1 $2 $3 $4 $5 $6 $7 $8):

#it doesn't work 

IFS=' ' read -a name_surname<<<$5 # Field 5 match to *name* in heading of csv

# Could I use inner awk with field values of outer awk ($5) to separate the field value of outer awk $5 ? 
# as an example:                                  
# $5="${awk '{${1^}${2^}}' $5}"
# where ${1^} and ${2^} fields of inner awk
  
name_surname[0]=${name_surname[0]^}
name_surname[1]=${name_surname[1]^}
  
$5="${name_surname[0]}' '${name_surname[1]}"

email_name=${name_surname[0]:0:1}
email_surname=${name_surname[1]}
domain='@domain'

$7="${email_name,}${email_surname,,}$domain" # match to field 7 *email* in heading of csv

how to add field values ($1 $2 $3 $4 $5 $6 $7 $8) to array and call function join for each for loop iteration to add record to new csv file?

function join { local IFS="$1"; shift; echo "$*"; }
result=$(join , ${arr[@]})
echo $result >> new.csv

Ed Morton · Accepted Answer

This may be what you're trying to do (using gawk for FPAT as you already were doing) but without more representative sample input and the expected output it's a guess:

$ cat tst.sh
#!/usr/bin/env bash

awk '
BEGIN {
    OFS = ","
    FPAT = "[^"OFS"]*|\"[^\"]*\""
}
NR > 1 {
    n = split($5,name,/\s*/)
    $7 = tolower(substr(name[1],1,1) name[n]) "@example.com"
    print
}
' "${@:--}"

$ ./tst.sh test.csv
1,1,,,Name surname,department1 department2 department3,nsurname@example.com,
2,1,,,name Surname,department1,nsurname@example.com,
3,2,,,Name Surname,"department1 department2, department3",nsurname@example.com,

I put the awk script inside a shell script since that looks like what you want, obviously you don't need to do that you could just save the awk script in a file and invoke it with awk -f.

How to assign awk result variable to an array and is it possible to use awk inside another awk in loop

Answers (2)

Related Questions