Removing specific strings from strings in a file

Question

I want to remove specific fields in all strings in a semi-colon delimited file.

The file looks something like this :-

texta1;texta2;texta3;texta4;texta5;texta6;texta7
textb1;textb2;textb3;textb4;textb5;textb6;textb7
textc1;textc2;textc3;textc4;textc5;textc6;textc7

I would like to remove positions 2, 5 and 7 from all strings in the file.

Desired output :-

texta1;texta3;texta4;texta6
textb1;textb3;textb4;textb6
textc1;textc3;textc4;textc6

I am trying to write a small shell script using 'awk' but the code is not working as expected. I am still seeing the semicolons in between & at the end not being removed.

(Note- I was able to do it with 'sed' but my file has several hundred thousands of records & the sed code is taking a lot of time)

Could you please provide some help on this ? Thanks in advance.

Wintermute · Accepted Answer

Most simply with cut:

cut -d \; -f 1,3-4,6,8- filename

or

cut -d \; -f 2,5,7 --complement filename

I think --complement is GNU-specific, though. The 8- in the first example is not actually necessary for a file with only seven columns; it would include all columns from the eighth forward if they existed. I included it because it doesn't hurt and provides a more general solution to the problem.

Removing specific strings from strings in a file

Answers (2)

Related Questions