Abhilipsa Mehra
Abhilipsa Mehra

Reputation: 55

Removing specific strings from strings in a file

I want to remove specific fields in all strings in a semi-colon delimited file.

The file looks something like this :-

texta1;texta2;texta3;texta4;texta5;texta6;texta7
textb1;textb2;textb3;textb4;textb5;textb6;textb7
textc1;textc2;textc3;textc4;textc5;textc6;textc7

I would like to remove positions 2, 5 and 7 from all strings in the file.

Desired output :-

texta1;texta3;texta4;texta6
textb1;textb3;textb4;textb6
textc1;textc3;textc4;textc6

I am trying to write a small shell script using 'awk' but the code is not working as expected. I am still seeing the semicolons in between & at the end not being removed.

(Note- I was able to do it with 'sed' but my file has several hundred thousands of records & the sed code is taking a lot of time)

Could you please provide some help on this ? Thanks in advance.

Upvotes: 1

Views: 70

Answers (2)

tommy.carstensen
tommy.carstensen

Reputation: 9622

I voted the answer by @Wintermute up, but if cut --complement is not available to you or you insist on using awk, then you can do:

awk -v scols=2,5,7 'BEGIN{FS=";"; OFS=";"} {
 split(scols,acols,","); for(i in acols) $acols[i]=""; gsub(";;", ";"); print}' tmp.txt

Upvotes: 0

Wintermute
Wintermute

Reputation: 44023

Most simply with cut:

cut -d \; -f 1,3-4,6,8- filename

or

cut -d \; -f 2,5,7 --complement filename

I think --complement is GNU-specific, though. The 8- in the first example is not actually necessary for a file with only seven columns; it would include all columns from the eighth forward if they existed. I included it because it doesn't hurt and provides a more general solution to the problem.

Upvotes: 4

Related Questions