Reputation: 171
I have a csv file where some rows have an empty first field, and some rows have content in the first field. The rows with content in the first field are header rows.
I would like to remove every unnecessary header row. The best way I can see of doing this is by deleting every row for which:
I do not necessarily need to keep the data in the same file, so I can see this being possible using grep, awk, or sed, but none of my attempts have come close to working.
Example input:
header1,value1,etc
,value2,etc
header2,value3,etc
header3,value4,etc
,value5,etc
Desired output:
header1,value1,etc
,value2,etc
header3,value4,etc
,value5,etc
Since the header2
line is not followed by a line with an empty field 1, it is an unnecessary header row.
Upvotes: 0
Views: 36
Reputation: 246807
These kind of tasks are often conceptually easier by reversing the file and checking if the previous line is a header:
tac file |
awk -F, '$1 && have_header {next} {print; have_header = length($1)}' |
tac
Upvotes: 0
Reputation: 241721
awk -F, '$1{h=$0;next}h{print h;h=""}1' file
-F,
: Use comma as a field separator
$1{h=$0;next}
: If the first field has data ( other than 0 ), save the line and go on to the next line.
h{print h;h=""}1
: If there is a saved header line, print it and forget it. (This can only execute if there is nothing in $1 because of the next
above.)
1
: print the current line.
Upvotes: 4