Reputation: 57

How can you remove specific duplicate strings in a file in linux

I have a list that has data paired with IP addresses and I only want to see the IP address once and I don't want to change the order.

192.168.0.100    fred is happy
192.168.0.100    fred likes pie
192.168.0.100    pie is good
192.168.0.110    tom like cake
192.168.0.110    cake is good
192.168.0.110    pie is better
192.168.0.112    bill like lettuce
192.168.0.112    lettuce is good for you
192.168.0.112    cake and pie are better tasting than lettuce

WHat I want to do is just remove the duplicate IP address but leave everything exactly the same.

I want to make it look like this

192.168.0.100    fred is happy
                 fred likes pie
                 pie is good
192.168.0.110    tom like cake
                 cake is good
                 pie is better
192.168.0.112    bill like lettuce
                 lettuce is good for you
                 cake and pie are better tasting than lettuce

I don't want to touch any of the duplicate words and I can't change the order

Thank you if you can help

Upvotes: 1

Answers (5)

Scrutinizer

Reputation: 9946

One more:

awk 'A[$1]++{s=$1; gsub(/./,FS,s); sub($1,s)}1' file

Upvotes: 1

Ed Morton

Reputation: 204638

This will work no matter what kind of spacing and/or RE metacharacters are in the file:

$ awk '
{ key = $1 }
key == prev { sub(/[^[:space:]]+/,sprintf("%*s",length(key),"")) }
{ prev = key; print }
' file
192.168.0.100    fred is happy
                 fred likes pie
                 pie is good
192.168.0.110    tom like cake
                 cake is good
                 pie is better
192.168.0.112    bill like lettuce
                 lettuce is good for you
                 cake and pie are better tasting than lettuce

Beware of solutions that use $1 in an RE context as those "."s in an IP address are RE metacharacters that mean "any character" so they might work for some sample data but you could get false matches given other input.

Upvotes: 2

potong

Reputation: 58578

This might work for you (GNU sed):

sed -r '1{:a;p;h;s/\s.*//;s/./ /g;H;d};G;s/^(\S+)(\s.*)\n\1.*\n(.*)/\3\2/;t;s/\n.*//;ba' file

Print the first record and those records where the key changes and store the key and its complement in spaces in the hold space. For subsequent records compare the stored key with the current key and for those that match replace the current key with the complement of spaces. For those keys that do not match remove the stored key and complement and repeat from the beginning.

Upvotes: 0

konsolebox

Reputation: 75618

Using awk:

awk 'BEGIN{FS=OFS="    "}{t=$1;if(t in a){gsub(/./," ",$1);a[t]=a[t]RS$0}else{a[t]=$0}}END{for(i in a)print a[i]}' file

Output:

192.168.0.100    fred is happy
                 fred likes pie
                 pie is good
192.168.0.110    tom like cake
                 cake is good
                 pie is better
192.168.0.112    bill like lettuce
                 lettuce is good for you
                 cake and pie are better tasting than lettuce

Upvotes: 1

Kent

Reputation: 195269

I guess the separator between ip and the text is tab, then this one-liner should work for you:

awk -F'\t' -v OFS='\t' 'a[$1]{gsub(/./," ",$1);print;next}{a[$1]=1}7' file

test with your file:

kent$  awk -F'\t' -v OFS='\t' 'a[$1]{gsub(/./," ",$1);print;next}{a[$1]=1}7' f
192.168.0.100   fred is happy
                fred likes pie
                pie is good
192.168.0.110   tom like cake
                cake is good
                pie is better
192.168.0.112   bill like lettuce
                lettuce is good for you
                cake and pie are better tasting than lettuce

Upvotes: 1

How can you remove specific duplicate strings in a file in linux

Answers (5)

Related Questions