Lucy
Lucy

Reputation: 21

Multiline CSV: output on a single line, with double-quoted input lines, using a different separator

I'm trying to get a multiline output from a CSV into one line in Bash.

My CSV file looks like this:

hi,bye
hello,goodbye

The end goal is for it to look like this:

"hi/bye", "hello/goodbye"

This is currently where I'm at:

INPUT=mycsvfile.csv
while IFS=, read col1 col2 || [ -n "$col1" ]
do
  source=$(awk '{print;}' | sed -e 's/,/\//g' )
  echo "$source";
done < $INPUT

The output is on every line and I'm able to change the , to a / but I'm not sure how to put the output on one line with quotes around it.

I've tried BEGIN:

source=$(awk 'BEGIN { ORS=", " }; {print;}'| sed -e 's/,/\//g' )

But this only outputs the last line, and omits the first hi/bye:

hello/goodbye

Would anyone be able to help me?

Upvotes: 2

Views: 1484

Answers (7)

Ed Morton
Ed Morton

Reputation: 203443

$ awk -F, -v OFS='/' -v ORS='"' '{$1=s ORS $1; s=", "; print} END{printf RS}' file
"hi/bye", "hello/goodbye"

Upvotes: 1

JFS31
JFS31

Reputation: 518

Here is my attempt in awk:

awk 'BEGIN{ ORS = " " }{ a++; gsub(/,/, "/"); gsub(/[a-z]+\/[a-z]+/, "\"&\""); print $0; if (a == 1){ print "," }}{ if (a==2){ printf "\n"; a = 0 } }'

Works also if your Input has more than two lines.If you need some explanation feel free to ask :)

Upvotes: 0

isosceleswheel
isosceleswheel

Reputation: 1546

I'm assuming you just have 2 lines in your file? If you have alternating 2 line pairs, let me know in comments and I will expand for that general case. Here is a one-line awk conversion for you:

# NOTE: I am using the octal ascii code for the 
# double quote char (\42=") in my printf statement
$ awk '{gsub(/,/,"/")}NR==1{printf("\42%s\42, ",$0)}NR==2{printf("\42%s\42\n",$0)}' file

output:

"hi/bye", "hello/goodbye"

Upvotes: 0

James Brown
James Brown

Reputation: 37404

In awk:

$ awk '{sub(/,/,"/");gsub(/^|$/,"\"");b=b (NR==1?"":", ")$0}END{print b}' file
"hi/bye", "hello/goodbye"

Explained:

$ awk '
{
    sub(/,/,"/")            # replace comma
    gsub(/^|$/,"\"")        # add quotes
    b=b (NR==1?"":", ") $0  # buffer to add delimiters
}
END { print b }             # output
' file

Upvotes: 0

mklement0
mklement0

Reputation: 437588

If you're willing to install trl, a utility of mine, the command can be simplified as follows:

input=mycsvfile.csv
trl -R '| ' < "$input" | tr ',|' '/,'
  • trl transforms multiline input into double-quoted single-line output separated by ,<space> by default.

    • -R '| ' (temporarily) uses |<space> as the separator instead; this assumes that your data doesn't contain | instances, but you can choose any char. that you know not be part of your data.
  • tr ',|' '/,' then translates all , instances (field-internal to the input lines) into / instances, and all | instances (the temporary separator) into , instances, yielding the overall result as desired.


Installation of trl from the npm registry (Linux and macOS)

Note: Even if you don't use Node.js, npm, its package manager, works across platforms and is easy to install; try
curl -L https://git.io/n-install | bash

With Node.js installed, install as follows:

[sudo] npm install trl -g

Note:

  • Whether you need sudo depends on how you installed Node.js and whether you've changed permissions later; if you get an EACCES error, try again with sudo.
  • The -g ensures global installation and is needed to put trl in your system's $PATH.

Manual installation (any Unix platform with bash)

  • Download this bash script as trl.
  • Make it executable with chmod +x trl.
  • Move it or symlink it to a folder in your $PATH, such as /usr/local/bin (macOS) or /usr/bin (Linux).

Upvotes: 1

mklement0
mklement0

Reputation: 437588

There is no need for a bash loop, which is invariably slow.

sed and tr can do this more efficiently:

input=mycsvfile.csv
sed 's/,/\//g; s/.*/"&", /; $s/, $//' "$input" | tr -d '\n'
  • s/,/\//g uses replaces all (g) , instances with / instances (escaped as \/ here).

  • s/.*/"&", / encloses the resulting line in "...", followed by ,<space>:

    • regex .* matches the entire pattern space (the potentially modified input line)
    • & in the replacement string represent that match.
  • $s/, $// removes the undesired trailing ,<space> from the final line ($)

  • tr -d '\n' then simply removes the newlines (\n) from the result, because sed invariably outputs each line with a trailing newline.

Note that the above command's single-line output will not have a trailing newline; simply append ; printf '\n' if it is needed.

Upvotes: 0

William Pursell
William Pursell

Reputation: 212248

Just do the whole thing (mostly) in awk. The final sed is just here to trim some trailing cruft and inject a newline at the end:

< mycsvfile.csv awk '{print "\""$1, $2"\""}' FS=, OFS=/ ORS=", " | sed 's/, $//'

Upvotes: 1

Related Questions