user3347931
user3347931

Reputation: 319

separate fields by comma using bash

How do I place commas between fields?

Input data

12123 'QA test case 1' 'QA environment'   
12234 'UAT test case 1' 'UAT environment'  

Expected output

12123, 'QA test case 1', 'QA environment'   
12234, 'UAT test case 1', 'UAT environment'  

Upvotes: 1

Views: 2127

Answers (7)

fcr
fcr

Reputation: 170

This solution (I believe) is not very nice, but standard at least:

awk 'BEGIN{SP="[:space:]"}{gsub("(["SP"]*('\''[^'\'']*'\''|[^'\''"SP"])+)","&,");if(match($0, (",["SP"]+$")))$0=substr($0,1,RSTART-1)substr($0,RSTART+1)}1'

Though some "broken" awk implementations don't support character classes with the [[:foo:]] style, in that case you can use:

awk 'BEGIN{SP=" \t\f\v\r\n"}{gsub("(["SP"]*('\''[^'\'']*'\''|[^'\''"SP"])+)","&,");if(match($0, (",["SP"]+$")))$0=substr($0,1,RSTART-1)substr($0,RSTART+1)}1'

Note: I used '\'' to place each single quote character because that's a simple and standard way to do it. If you want to use this line in a ".awk" file, just replace every occurrence with a single quote.

Upvotes: 0

Jotne
Jotne

Reputation: 41446

Here is how I handle csv with awk

cat file
12123 'QA test case 1' 'QA environment' some more
12234 'UAT test case 1' 'UAT environment'

awk '{for (i=1;i<NF;i++) {if ($i~t) c++;printf "%s"(c%2?FS:", "),$i}print $NF}' t=\' file
12123, 'QA test case 1', 'QA environment', some, more
12234, 'UAT test case 1', 'UAT environment'

This keep track of how many ' it finds.
If its 0 2 4 6 etc you are outside a group, split using ,
If its 1 3 5 7 etc you are inside a group, split by (a space)


Since you now have a good separator, you can get rid of the '

awk '{for (i=1;i<NF;i++) {if ($i~t) c++;sub(t,"",$i);printf "%s"(c%2?FS:","),$i}sub(t,"",$NF);print $NF}' t=\' file
12123,QA test case 1,QA environment,some,more
12234,UAT test case 1,UAT environment

You could also use FPAT that is used to define fields, opposed to FS that define separators, but then you need gnu awk 4.x, and it would not be portable.

awk '{for (i=1;i<NF;i++) printf "%s, ",$i;print $NF}' FPAT="[^' ]+|'[^']+'" file
12123, 'QA test case 1', 'QA environment', some, more
12234, 'UAT test case 1', 'UAT environment'

How does the FPAT="[^' ]+|'[^']+'" works?
1. A field should not contain one or more ' or space. [^' ]+ eks some and more
2. A field starts with ' then one or more not ' and then ends with '. '[^']+' eks 'test data'

Upvotes: 1

Ed Morton
Ed Morton

Reputation: 203219

$ sed "s/ '/,&/g" file
12123, 'QA test case 1', 'QA environment'
12234, 'UAT test case 1', 'UAT environment'

Upvotes: 2

n0741337
n0741337

Reputation: 2504

Try this awk:

awk -F" '" '{ print $1, $2, $3 }' OFS=", '" data

or using a BEGIN block:

awk -F" '" 'BEGIN {OFS="," FS} { print $1, $2, $3 }' data

In either case, the FS is being set to ' (space + "'") and OFS is being set to "," + '. It's based on the assumption ' is a validly unique field separator and all input data is formatted/arranged as in the question.

Upvotes: 1

mklement0
mklement0

Reputation: 437176

A naïve bash implementation that assumes that no (escaped) ' instances ever appear inside a field:

  • Original single-quoting is preserved.
  • Accepts any number of input fields.
  • Any fields may be single-quoted.
  • Caveat: whitespace between fields is normalized (replaced with a single space each), as is whitespace inside a quoted field.

Input is assumed to come from file file:

# Read all whitespace-separated tokens (potentially across quoted field boundaries).
while read -ra tkns; do  
  # Initialize per-line variables.
  numTkns=${#tkns[@]} i=0 inField=0
  # Loop over all tokens.
  for tkn in "${tkns[@]}"; do
    # Determine if we're inside a quoted field.
    [[ $tkn == \'* ]] && inField=1
    [[ $tkn == *\' ]] && inField=0
    # Determine the output separator:
    if (( ++i == numTkns )); then
      sep=$'\n' # last token, terminate output line with \n
    else
      # inside a field: use just a space; between fields: use ', '
      (( inField )) && sep=' ' || sep=', '
    fi
    # Output token and separator.
    printf '%s%s' "$tkn" "$sep"
  done
done < file

Upvotes: 2

glenn jackman
glenn jackman

Reputation: 246754

Another option is to use a CSV parser:

ruby -rcsv -ne '
  puts CSV.generate_line(
         CSV.parse_line($_.strip, {:col_sep => " ", :quote_char => "'\''"}
       ), {:force_quotes => 1}) 
'  file
"12123","QA test case 1","QA environment"
"12234","UAT test case 1","UAT environment"

Upvotes: 3

Costi Ciudatu
Costi Ciudatu

Reputation: 38195

Your input data looks very much like an argument list. Therefore, one convenient approach would be to define a bash function that simply returns its argument list as comma separated tokens and invoke that for each line in your file.

However, the simple implementation below will lose the quotes around the multi-word tokens (but it will place the commas properly). If you need those quotes exactly as they were, it would get a bit more complicated (it's very easy to output every token quoted though):

#!/bin/bash
function csv_args() {
    while [ -n "$1" ]; do
        echo -n "$1"
        shift
        [ -n "$1" ] && echo -n ', '
    done
    echo
}

while read line; do
    eval csv_args $line
done < /path/to/your/file

Upvotes: 2

Related Questions