Pedro Cardoso
Pedro Cardoso

Reputation: 89

Generating csv from text file in Linux command line with sed, awk or other

I have a file with thousands of lines that I would like to have it as a csv, for later processing.

The original file looks like this:

cc_1527 (ILDO_I173_net9 VSSA) capacitor_mis c=9.60713e-16
cc_1526 (VDD_MAIN Istartupcomp_I115_G7) capacitor_mis \
    c=4.18106e-16
cc_1525 (VDD_MAIN Istartupcomp_I7_net025) capacitor_mis \
    c=9.71462e-16
cc_1524 (VDD_MAIN Istartupcomp_I7_ST_net14) \
    capacitor_mis c=4.6011e-17
cc_1523 (VDD_MAIN Istartupcomp_I7_ST_net15) \
    capacitor_mis c=1.06215e-15
cc_1522 (VDD_MAIN ILDO_LDO_core_Istartupcomp_I7_ST_net16) \
    capacitor_mis c=1.37289e-15
cc_1521 (VDD_MAIN ILDO_LDO_core_Istartupcomp_I7_I176_G4) capacitor_mis \
    c=6.81758e-16

The problem here, is that some of the lines continue to the next one, indicated by the symbol "\".

The final csv format for the first 5 lines of the original text should be:

cc_1527,(ILDO_I173_net9 VSSA),capacitor_mis c=9.60713e-16
cc_1526,(VDD_MAIN Istartupcomp_I115_G7),capacitor_mis,c=4.18106e-16
cc_1525,(VDD_MAIN Istartupcomp_I7_net025),capacitor_mis,c=9.71462e-16

So, now everything is in one line only and the "\" characters have been removed.

Please notice that may exist spaces in the beginning of each line, so these should be trimmed before anything else is done.

Any idea on how to accomplish this. ?

Thanks in advance.

Best regards, Pedro

Upvotes: 0

Views: 74

Answers (2)

tshiono
tshiono

Reputation: 22012

The answer by @Shawn has been accepted by the OP and I'm not sure if my answer is worth posting but allow me to do it just for information. If Perl is your option, please try the following script which preserves the whitespaces within parens not replacing them by commas:

perl -0777 -ne '
    s/\\\n//g;
    foreach $line (split(/\n/)) {
        while ($line =~ /(\([^)]+\))|(\S+)/g) {
            push(@ary, $&);
        }
        print join(",", @ary), "\n";
        @ary = ();
    }
' input.txt

Output:

cc_1527,(ILDO_I173_net9 VSSA),capacitor_mis,c=9.60713e-16
cc_1526,(VDD_MAIN Istartupcomp_I115_G7),capacitor_mis,c=4.18106e-16
cc_1525,(VDD_MAIN Istartupcomp_I7_net025),capacitor_mis,c=9.71462e-16
cc_1524,(VDD_MAIN Istartupcomp_I7_ST_net14),capacitor_mis,c=4.6011e-17
cc_1523,(VDD_MAIN Istartupcomp_I7_ST_net15),capacitor_mis,c=1.06215e-15
cc_1522,(VDD_MAIN ILDO_LDO_core_Istartupcomp_I7_ST_net16),capacitor_mis,c=1.37289e-15
cc_1521,(VDD_MAIN ILDO_LDO_core_Istartupcomp_I7_I176_G4),capacitor_mis,c=6.81758e-16

[How it works]

  • First of all, -0777 -ne option tells Perl to slurp all lines into the Perl's default variable $_.
  • Next, s/\\\n//g; removes trailing backslashes by merging lines.
  • Then split(/\n/) splits the lines on newlines back again.
  • The regex /(\([^)]+\))|(\S+)/g will be the most important part which divides each line into fields. The field pattern is defined as: "substring surrounded by parens OR substring which does not include whitespaces." It works as FPAT in awk and preserves whitespaces between parens without dividing the line on them.

I've tested with approx. 10,000 line input and the execution time is less than a second.
Hope this helps.

Upvotes: 1

Shawn
Shawn

Reputation: 52354

Using some of the more obscure features of sed (It can do more than s///):

$ sed -E ':line /\\$/ {s/\\$//; N; b line}; s/[[:space:]]+/,/g' demo.txt
cc_1527,(ILDO_I173_net9,VSSA),capacitor_mis,c=9.60713e-16
cc_1526,(VDD_MAIN,Istartupcomp_I115_G7),capacitor_mis,c=4.18106e-16
cc_1525,(VDD_MAIN,Istartupcomp_I7_net025),capacitor_mis,c=9.71462e-16
cc_1524,(VDD_MAIN,Istartupcomp_I7_ST_net14),capacitor_mis,c=4.6011e-17
cc_1523,(VDD_MAIN,Istartupcomp_I7_ST_net15),capacitor_mis,c=1.06215e-15
cc_1522,(VDD_MAIN,ILDO_LDO_core_Istartupcomp_I7_ST_net16),capacitor_mis,c=1.37289e-15
cc_1521,(VDD_MAIN,ILDO_LDO_core_Istartupcomp_I7_I176_G4),capacitor_mis,c=6.81758e-16

Basically:

  • Read a line into the pattern space.

  • :line /\\$/ {s/\\$//; N; b line}: If the pattern space ends in a \, remove that backslash, read the next line and append it to the pattern space, and repeat this step.

  • s/[[:space:]]+/,/g: Convert every case of 1 or more whitespace characters to a single comma.

  • Print the result, and go back to the beginning with a new line.

Upvotes: 1

Related Questions