A.Matrosov
A.Matrosov

Reputation: 15

How to assign awk result variable to an array and is it possible to use awk inside another awk in loop

I've started to learn bash and totally stuck with the task. I have a comma separated csv file with records like:

id,location_id,organization_id,service_id,name,title,email,department
1,1,,,Name surname,department1 department2 department3,,
2,1,,,name Surname,department1,,
3,2,,,Name Surname,"department1 department2, department3",, e.t.c.

I need to format it this way: name and surname must start with a capital letter

I split csv on records using awk ( cause some fields contain fields with a comma between quotes "department1 department2, department3" ).

#!/bin/bash
input="$HOME/test.csv"

exec 0<$input

while read line; do

awk -v FPAT='"[^"]*"|[^,]*' '{ 
  ...
}' $input)

done

inside awk {...} (NF=8 for each record), I tried to use certain field values ($1 $2 $3 $4 $5 $6 $7 $8):

#it doesn't work 

IFS=' ' read -a name_surname<<<$5 # Field 5 match to *name* in heading of csv

# Could I use inner awk with field values of outer awk ($5) to separate the field value of outer awk $5 ? 
# as an example:                                  
# $5="${awk '{${1^}${2^}}' $5}"
# where ${1^} and ${2^} fields of inner awk
  
name_surname[0]=${name_surname[0]^}
name_surname[1]=${name_surname[1]^}
  
$5="${name_surname[0]}' '${name_surname[1]}"

email_name=${name_surname[0]:0:1}
email_surname=${name_surname[1]}
domain='@domain'

$7="${email_name,}${email_surname,,}$domain" # match to field 7 *email* in heading of csv

how to add field values ($1 $2 $3 $4 $5 $6 $7 $8) to array and call function join for each for loop iteration to add record to new csv file?

function join { local IFS="$1"; shift; echo "$*"; }
result=$(join , ${arr[@]})
echo $result >> new.csv  

Upvotes: 0

Views: 694

Answers (2)

A.Matrosov
A.Matrosov

Reputation: 15

Completely working answer by Ed Morton.

If it may be will be helpful for someone, I added one more checking condition: if in CSV file more than one email address with the same name - index number is added to email local part and output is sent to file

#!/usr/bin/env bash
input="$HOME/test.csv"
exec 0<$input

awk '
BEGIN {
  OFS = ","
  FPAT = "[^"OFS"]*|\"[^\"]*\""
}

(NR == 1) {print} #header of csv
(NR > 1) {

  if (length($0) > 1) { #exclude empty lines
    count = 0
    n = split($5,name,/\s*/)
    email_local_part = tolower(substr(name[1],1,1) name[n])
   
    #array stores emails from csv file
    a[i++] = email_local_part
    
    #find amount of occurrences of the same email address
    for (el in a) {
      ret=match(a[el], email_local_part)
  
      if (ret == 1) { count++ }
    } 

    #add number of occurrence to email address
    if (count == 1) { $7 = email_local_part "@abc.com" }
    else { --count; $7 = email_local_part count "@abc.com" }

    print 
  }
} 
' "${@:--}" > new.csv

Upvotes: 0

Ed Morton
Ed Morton

Reputation: 203393

This may be what you're trying to do (using gawk for FPAT as you already were doing) but without more representative sample input and the expected output it's a guess:

$ cat tst.sh
#!/usr/bin/env bash

awk '
BEGIN {
    OFS = ","
    FPAT = "[^"OFS"]*|\"[^\"]*\""
}
NR > 1 {
    n = split($5,name,/\s*/)
    $7 = tolower(substr(name[1],1,1) name[n]) "@example.com"
    print
}
' "${@:--}"

$ ./tst.sh test.csv
1,1,,,Name surname,department1 department2 department3,[email protected],
2,1,,,name Surname,department1,[email protected],
3,2,,,Name Surname,"department1 department2, department3",[email protected],

I put the awk script inside a shell script since that looks like what you want, obviously you don't need to do that you could just save the awk script in a file and invoke it with awk -f.

Upvotes: 2

Related Questions