samsamara
samsamara

Reputation: 4750

Remove lines from a text file where columns getting repeated ubuntu

I have a text file like below.

1 1223 abc
2 4234 weroi
0 3234 omsder
1 1111 abc 
2 6666 weroi

I want to have unique values for the column 3. So I want to have the below file.

1 1223 abc
2 4234 weroi
0 3234 omsder

Can I do this using some basic commands in Linux? without using Java or something.

Upvotes: 0

Views: 121

Answers (1)

Aserre
Aserre

Reputation: 5062

You could do this with some awk scripting. Here is a piece of code I came up with to address your problem :

awk 'BEGIN {col=3; sep=" "; forbidden=sep} {if (match(forbidden, sep $col sep) == 0) {forbidden=forbidden $col sep; print $0}}' input.file

The BEGIN keyword declares the forbidden string, which is used to monitor the 3rd column values. Then, the match keyword check if the 3rd column of the current line contains any forbidden value. If not, it adds the content of the column to the forbidden list and print the whole line.

Here, sep=" " instantiate the separator. We use sep between each forbidden value in order to avoid words created by putting several values next to one another. For instance :

1 1111 ta
2 2222 to
3 3333 t
4 4444 tato

In this case, without a separator, t and tato would be considered a forbidden value. We use " " as a separator as it is used by default to separate each column, thus a column cannot include a space in its name.

Note that if you want to change the number of the column in which you need to remove duplicate, just adapt col=3 with the number of the column you need (0 for the whole line, 1 for the first column, 2 for the second, ...)

Upvotes: 1

Related Questions