Reputation: 89
I have a file with thousands of lines that I would like to have it as a csv, for later processing.
The original file looks like this:
cc_1527 (ILDO_I173_net9 VSSA) capacitor_mis c=9.60713e-16
cc_1526 (VDD_MAIN Istartupcomp_I115_G7) capacitor_mis \
c=4.18106e-16
cc_1525 (VDD_MAIN Istartupcomp_I7_net025) capacitor_mis \
c=9.71462e-16
cc_1524 (VDD_MAIN Istartupcomp_I7_ST_net14) \
capacitor_mis c=4.6011e-17
cc_1523 (VDD_MAIN Istartupcomp_I7_ST_net15) \
capacitor_mis c=1.06215e-15
cc_1522 (VDD_MAIN ILDO_LDO_core_Istartupcomp_I7_ST_net16) \
capacitor_mis c=1.37289e-15
cc_1521 (VDD_MAIN ILDO_LDO_core_Istartupcomp_I7_I176_G4) capacitor_mis \
c=6.81758e-16
The problem here, is that some of the lines continue to the next one, indicated by the symbol "\".
The final csv format for the first 5 lines of the original text should be:
cc_1527,(ILDO_I173_net9 VSSA),capacitor_mis c=9.60713e-16
cc_1526,(VDD_MAIN Istartupcomp_I115_G7),capacitor_mis,c=4.18106e-16
cc_1525,(VDD_MAIN Istartupcomp_I7_net025),capacitor_mis,c=9.71462e-16
So, now everything is in one line only and the "\" characters have been removed.
Please notice that may exist spaces in the beginning of each line, so these should be trimmed before anything else is done.
Any idea on how to accomplish this. ?
Thanks in advance.
Best regards, Pedro
Upvotes: 0
Views: 74
Reputation: 22012
The answer by @Shawn has been accepted by the OP and I'm not sure
if my answer is worth posting but allow me to do it just for information.
If Perl
is your option, please try the following script which preserves
the whitespaces within parens not replacing them by commas:
perl -0777 -ne '
s/\\\n//g;
foreach $line (split(/\n/)) {
while ($line =~ /(\([^)]+\))|(\S+)/g) {
push(@ary, $&);
}
print join(",", @ary), "\n";
@ary = ();
}
' input.txt
Output:
cc_1527,(ILDO_I173_net9 VSSA),capacitor_mis,c=9.60713e-16
cc_1526,(VDD_MAIN Istartupcomp_I115_G7),capacitor_mis,c=4.18106e-16
cc_1525,(VDD_MAIN Istartupcomp_I7_net025),capacitor_mis,c=9.71462e-16
cc_1524,(VDD_MAIN Istartupcomp_I7_ST_net14),capacitor_mis,c=4.6011e-17
cc_1523,(VDD_MAIN Istartupcomp_I7_ST_net15),capacitor_mis,c=1.06215e-15
cc_1522,(VDD_MAIN ILDO_LDO_core_Istartupcomp_I7_ST_net16),capacitor_mis,c=1.37289e-15
cc_1521,(VDD_MAIN ILDO_LDO_core_Istartupcomp_I7_I176_G4),capacitor_mis,c=6.81758e-16
[How it works]
-0777 -ne
option tells Perl
to slurp all lines
into the Perl's default variable $_
.s/\\\n//g;
removes trailing backslashes by merging lines.split(/\n/)
splits the lines on newlines back again./(\([^)]+\))|(\S+)/g
will be the most important part
which divides each line into fields. The field pattern is defined as:
"substring surrounded by parens OR substring which does not include whitespaces."
It works as FPAT
in awk
and preserves whitespaces
between parens without dividing the line on them.I've tested with approx. 10,000 line input and the execution time
is less than a second.
Hope this helps.
Upvotes: 1
Reputation: 52354
Using some of the more obscure features of sed (It can do more than s///
):
$ sed -E ':line /\\$/ {s/\\$//; N; b line}; s/[[:space:]]+/,/g' demo.txt
cc_1527,(ILDO_I173_net9,VSSA),capacitor_mis,c=9.60713e-16
cc_1526,(VDD_MAIN,Istartupcomp_I115_G7),capacitor_mis,c=4.18106e-16
cc_1525,(VDD_MAIN,Istartupcomp_I7_net025),capacitor_mis,c=9.71462e-16
cc_1524,(VDD_MAIN,Istartupcomp_I7_ST_net14),capacitor_mis,c=4.6011e-17
cc_1523,(VDD_MAIN,Istartupcomp_I7_ST_net15),capacitor_mis,c=1.06215e-15
cc_1522,(VDD_MAIN,ILDO_LDO_core_Istartupcomp_I7_ST_net16),capacitor_mis,c=1.37289e-15
cc_1521,(VDD_MAIN,ILDO_LDO_core_Istartupcomp_I7_I176_G4),capacitor_mis,c=6.81758e-16
Basically:
Read a line into the pattern space.
:line /\\$/ {s/\\$//; N; b line}
: If the pattern space ends in a \
, remove that backslash, read the next line and append it to the pattern space, and repeat this step.
s/[[:space:]]+/,/g
: Convert every case of 1 or more whitespace characters to a single comma.
Print the result, and go back to the beginning with a new line.
Upvotes: 1