Reputation: 55
I want to remove specific fields in all strings in a semi-colon delimited file.
The file looks something like this :-
texta1;texta2;texta3;texta4;texta5;texta6;texta7
textb1;textb2;textb3;textb4;textb5;textb6;textb7
textc1;textc2;textc3;textc4;textc5;textc6;textc7
I would like to remove positions 2, 5 and 7 from all strings in the file.
Desired output :-
texta1;texta3;texta4;texta6
textb1;textb3;textb4;textb6
textc1;textc3;textc4;textc6
I am trying to write a small shell script using 'awk' but the code is not working as expected. I am still seeing the semicolons in between & at the end not being removed.
(Note- I was able to do it with 'sed' but my file has several hundred thousands of records & the sed code is taking a lot of time)
Could you please provide some help on this ? Thanks in advance.
Upvotes: 1
Views: 70
Reputation: 9622
I voted the answer by @Wintermute up, but if cut --complement
is not available to you or you insist on using awk, then you can do:
awk -v scols=2,5,7 'BEGIN{FS=";"; OFS=";"} {
split(scols,acols,","); for(i in acols) $acols[i]=""; gsub(";;", ";"); print}' tmp.txt
Upvotes: 0
Reputation: 44023
Most simply with cut
:
cut -d \; -f 1,3-4,6,8- filename
or
cut -d \; -f 2,5,7 --complement filename
I think --complement
is GNU-specific, though. The 8-
in the first example is not actually necessary for a file with only seven columns; it would include all columns from the eighth forward if they existed. I included it because it doesn't hurt and provides a more general solution to the problem.
Upvotes: 4