Reputation: 390
I'm stuck in something that looks like it should be simple to SED.
I have some (kind of) CSV files that I get from another application, so I cannot control its output. Some preprocessing is already done with SED, but I am stuck on the last one. So I wish to do it with SED, if possible, to avoid using a third application.
The problem is that the heading line of the file (first line) is repeated along the file, but unfortunately with the following characteristics:
So, suppose I have the following 2 files:
Cash.csv
Name; Amount
John; 3.55
Erick; 4.76
John; 8.99
Name; Amount
Erick; 4.76
Mark; 1.00
Name; Amount
John; 3.55
Check.csv
Name; Account; Amount
Erick; 345344; 123.00
Mark; 88849; 323.50
Name; Account; Amount
John; 474473; 99.00
Mark; 88849; 323.50
Mark; 88849; 323.50
John; 474473; 99.00
What I wish is a single SED script that applied to each file turn them into:
Cash.processed.csv
Name; Amount
John; 3.55
Erick; 4.76
John; 8.99
Erick; 4.76
Mark; 1.00
John; 3.55
Check.processed.csv
Name; Account; Amount
Erick; 345344; 123.00
Mark; 88849; 323.50
John; 474473; 99.00
Mark; 88849; 323.50
Mark; 88849; 323.50
John; 474473; 99.00
I was wondering if its possible to use SED "hold buffer" as a pattern on the delete command:
1h #Hold the first line (headings)
/\h/d #Use hold buffer as a pattern to delete
Supposing "\h" would return the hold buffer to the delete command.
Thanks for any replies;
PS: Please don't answer with the following over-specific command:
1p;/Name; Amount\|Name; Account; Amout/d
Upvotes: 1
Views: 200
Reputation: 58371
This might work for you (GNU sed):
sed '1h;1!{G;/^\(.*\)\n\1/d;s/\n.*//}' file
Explanation:
1h
store the heading line in the hold space (HS) and print.1!{G;/^\(.*\)\n\1/d;s/\n.*//}
for every line but the first, append a newline followed by the contents of the HS (i.e. the heading line). Compare the first part of the line to the heading line and if it's the same delete that line. If it's not delete the appended newline and heading line and print as normal.EDIT:
This is indeed very slow on large files, a quicker and perhaps easier to understand solution is:
sed 's|.*|1!{/^&$/d}|;q' file | sed -f - file
This makes a sed script from the first line of the input file.
Upvotes: 2
Reputation: 67211
In case if you are interested in awk:
awk '{if(NR==1){p=$0;print}if(NR>1 && p!=$0)print}' your_file
Upvotes: 1
Reputation: 753525
I think you'll need to capture the first line from one sed
command and then use that in the main operational command:
line1=$(sed 1q $datafile)
sed -e "2,$ {/$line1/d;}" \
-e '...rest of sed script...' $datafile
Because the sed 1q
quits after reading the first line, it is quick regardless of how big the data file is. If there's a chance that the first line might contain a slash (heading "Name/Number"
, perhaps) or other regex metacharacters, then think of using something like this, which replaces all slashes with .
:
line1=$(sed '1{s%/%.%g;q;}' $datafile)
I did some futzing with the Mac OS X (10.8.1) version of sed
, which is fussier than GNU sed
. In the second (main) sed
command, the match had to be in {...}
, the dollar had to be separate (or the shell gets antsy about invalid parameter substitution), and the semi-colon was needed. Some of those restrictions probably aren't needed with GNU sed
, but the code shown is likely to work anywhere.
Upvotes: 4