user2061101
user2061101

Reputation: 1

remove a line with a duplicate string from multiple lines of a file

I have a file with many such lines containing paths of some files.

../../ds1_src/wrapper/memory/beh/mem_1w1r_8x160.v
../../ds1_src/wrapper/memory/beh/mem_1w1r_8x134.v
../../ds1_src/wrapper/memory/beh/mem_1w1r_8x178.v
../../ds1_src/wrapper/memory/beh/mem_1w1r_20x68.v
../../ds1_src/wrapper/memory/beh/mem_1w1r_280x128.v
../../ds2_src/wrapper/memory/beh/mem_1w1r_8x160.v
../../ds2_src/wrapper/memory/beh/mem_1w1r_8x134.v
../../ds2_src/wrapper/memory/beh/mem_1w1r_64x7.v
../../ds2_src/wrapper/memory/beh/mem_1w1r_24x128.v
../../ds2_src/wrapper/memory/beh/mem_1w1r_8x178.v
../../ds2_src/wrapper/memory/beh/mem_1w1r_20x68.v
../../ds2_src/wrapper/memory/beh/mem_1w1r_280x128.v
../../ds3_src/wrapper/memory/beh/mem_1w1r_1x160.v
../../ds3_src/wrapper/memory/beh/mem_1w1r_1x128.v
../../us_src/wrapper/memory/beh/mem_1w1r_128x8.v
../../us_src/wrapper/memory/beh/mem_1w1r_8x160.v
../../us_src/wrapper/memory/beh/mem_1w1r_8x178.v
../../us_src/wrapper/memory/beh/mem_1w1r_20x68.v
../../us_src/wrapper/memory/beh/mem_1w1r_280x128.v
../../src/sw/mem_1w1r_8x31.v
../../src/sw/mem_1w1r_8x35.v

some of these are duplicate files.

I would want this to be sorted and uniqified such that the duplicate file'd lines are removed.

% grep -r "mem_1w" rtl_list | awk '{split($$0,a,"/"); print a[7]}' | sort -u

I can do something like above to uniqify the file names - but that causes the first part of the line ../../ds1_src/.... etc.. to be chopped off. Also, there is a possibility of duplicates at a[5] as well as in ../../src/sw/mem_1w1r_8x31.v.

mem_1w1r_128x8.v
mem_1w1r_145x133.v
mem_1w1r_1x128.v
mem_1w1r_1x160.v
mem_1w1r_20x68.v
mem_1w1r_24x128.v
mem_1w1r_280x128.v
mem_1w1r_64x7.v
mem_1w1r_73x133.v
mem_1w1r_8x134.v
mem_1w1r_8x160.v
mem_1w1r_8x178.v

How can I sort this out - to get something like -

../../ds1_src/wrapper/memory/beh/mem_1w1r_8x160.v
../../ds1_src/wrapper/memory/beh/mem_1w1r_8x134.v
../../ds1_src/wrapper/memory/beh/mem_1w1r_64x7.v
../../ds1_src/wrapper/memory/beh/mem_1w1r_1x160.v
../../ds1_src/wrapper/memory/beh/mem_1w1r_1x128.v

which would remove any duplicate file existing in different location ?

Upvotes: 0

Views: 242

Answers (4)

Martin Ellis
Martin Ellis

Reputation: 9651

You could use the following to make the list unique by filename:

awk -F/ '!F[$NF] && F[$NF]=$0'

This will only include the path to the first file with each unique name. You can then pipe the result through sort, but no need for the -u option.

This works by building an array in awk. Each index is the the filename without the path (with the -F/ option, the filename is simply $NF). Each element is the full path to the first file with that filename (the full path is $0). New array entries are only added and printed if the filename has not been seen before.

Upvotes: 0

Steve
Steve

Reputation: 54402

Here's one way using awk:

awk -F "/" '/mem_1w/ && !a[$NF]++' file

Results:

../../ds1_src/wrapper/memory/beh/mem_1w1r_8x160.v
../../ds1_src/wrapper/memory/beh/mem_1w1r_8x134.v
../../ds1_src/wrapper/memory/beh/mem_1w1r_8x178.v
../../ds1_src/wrapper/memory/beh/mem_1w1r_20x68.v
../../ds1_src/wrapper/memory/beh/mem_1w1r_280x128.v
../../ds2_src/wrapper/memory/beh/mem_1w1r_64x7.v
../../ds2_src/wrapper/memory/beh/mem_1w1r_24x128.v
../../ds3_src/wrapper/memory/beh/mem_1w1r_1x160.v
../../ds3_src/wrapper/memory/beh/mem_1w1r_1x128.v
../../us_src/wrapper/memory/beh/mem_1w1r_128x8.v
../../src/sw/mem_1w1r_8x31.v
../../src/sw/mem_1w1r_8x35.v

Upvotes: 1

vladmihaisima
vladmihaisima

Reputation: 2248

echo > output.txt ; 
while read F; do 
    N=$(basename ${F}); 
    grep $N output.txt 1>/dev/null || echo $F >> output.txt ; 
done 

Of course you can make it on one line.

I think also your output is not correct from your input file I get (there are not all the files in the input).

Upvotes: 0

catalin.me
catalin.me

Reputation: 111

Maybe you could use python to keep a mapping between filename and existing file paths like (key, [path1, path2 ...]). Then you could maybe sort the possible file paths and use only the first one.

import os.path
import fileinput

# dictonary to hold (key, [path1, path2 ...]) 
file_paths = {}

for line in fileinput.input("input.txt"):
    value = line.strip('\n')
    base = os.path.basename(line)

    if base in file_paths:
      # add another value to existing key
      keys = file_paths[base]
      keys.append(value)
      file_paths[base] = keys
    else:
      # init another bucket
      file_paths[base] = [value]

# print only firt value
for key in file_paths.keys():
    sl = sorted(file_paths[key])
    print sl[0]

Upvotes: 0

Related Questions