Reputation: 1
I have a file with many such lines containing paths of some files.
../../ds1_src/wrapper/memory/beh/mem_1w1r_8x160.v
../../ds1_src/wrapper/memory/beh/mem_1w1r_8x134.v
../../ds1_src/wrapper/memory/beh/mem_1w1r_8x178.v
../../ds1_src/wrapper/memory/beh/mem_1w1r_20x68.v
../../ds1_src/wrapper/memory/beh/mem_1w1r_280x128.v
../../ds2_src/wrapper/memory/beh/mem_1w1r_8x160.v
../../ds2_src/wrapper/memory/beh/mem_1w1r_8x134.v
../../ds2_src/wrapper/memory/beh/mem_1w1r_64x7.v
../../ds2_src/wrapper/memory/beh/mem_1w1r_24x128.v
../../ds2_src/wrapper/memory/beh/mem_1w1r_8x178.v
../../ds2_src/wrapper/memory/beh/mem_1w1r_20x68.v
../../ds2_src/wrapper/memory/beh/mem_1w1r_280x128.v
../../ds3_src/wrapper/memory/beh/mem_1w1r_1x160.v
../../ds3_src/wrapper/memory/beh/mem_1w1r_1x128.v
../../us_src/wrapper/memory/beh/mem_1w1r_128x8.v
../../us_src/wrapper/memory/beh/mem_1w1r_8x160.v
../../us_src/wrapper/memory/beh/mem_1w1r_8x178.v
../../us_src/wrapper/memory/beh/mem_1w1r_20x68.v
../../us_src/wrapper/memory/beh/mem_1w1r_280x128.v
../../src/sw/mem_1w1r_8x31.v
../../src/sw/mem_1w1r_8x35.v
some of these are duplicate files.
I would want this to be sorted and uniqified such that the duplicate file'd lines are removed.
% grep -r "mem_1w" rtl_list | awk '{split($$0,a,"/"); print a[7]}' | sort -u
I can do something like above to uniqify the file names - but that causes the first part of the line ../../ds1_src/....
etc.. to be chopped off. Also, there is a possibility of duplicates at a[5] as well as in ../../src/sw/mem_1w1r_8x31.v
.
mem_1w1r_128x8.v
mem_1w1r_145x133.v
mem_1w1r_1x128.v
mem_1w1r_1x160.v
mem_1w1r_20x68.v
mem_1w1r_24x128.v
mem_1w1r_280x128.v
mem_1w1r_64x7.v
mem_1w1r_73x133.v
mem_1w1r_8x134.v
mem_1w1r_8x160.v
mem_1w1r_8x178.v
How can I sort this out - to get something like -
../../ds1_src/wrapper/memory/beh/mem_1w1r_8x160.v
../../ds1_src/wrapper/memory/beh/mem_1w1r_8x134.v
../../ds1_src/wrapper/memory/beh/mem_1w1r_64x7.v
../../ds1_src/wrapper/memory/beh/mem_1w1r_1x160.v
../../ds1_src/wrapper/memory/beh/mem_1w1r_1x128.v
which would remove any duplicate file existing in different location ?
Upvotes: 0
Views: 242
Reputation: 9651
You could use the following to make the list unique by filename:
awk -F/ '!F[$NF] && F[$NF]=$0'
This will only include the path to the first file with each unique name. You can then pipe the result through sort
, but no need for the -u
option.
This works by building an array in awk. Each index is the the filename without the path (with the -F/
option, the filename is simply $NF
). Each element is the full path to the first file with that filename (the full path is $0
). New array entries are only added and printed if the filename has not been seen before.
Upvotes: 0
Reputation: 54402
Here's one way using awk
:
awk -F "/" '/mem_1w/ && !a[$NF]++' file
Results:
../../ds1_src/wrapper/memory/beh/mem_1w1r_8x160.v
../../ds1_src/wrapper/memory/beh/mem_1w1r_8x134.v
../../ds1_src/wrapper/memory/beh/mem_1w1r_8x178.v
../../ds1_src/wrapper/memory/beh/mem_1w1r_20x68.v
../../ds1_src/wrapper/memory/beh/mem_1w1r_280x128.v
../../ds2_src/wrapper/memory/beh/mem_1w1r_64x7.v
../../ds2_src/wrapper/memory/beh/mem_1w1r_24x128.v
../../ds3_src/wrapper/memory/beh/mem_1w1r_1x160.v
../../ds3_src/wrapper/memory/beh/mem_1w1r_1x128.v
../../us_src/wrapper/memory/beh/mem_1w1r_128x8.v
../../src/sw/mem_1w1r_8x31.v
../../src/sw/mem_1w1r_8x35.v
Upvotes: 1
Reputation: 2248
echo > output.txt ; while read F; do N=$(basename ${F}); grep $N output.txt 1>/dev/null || echo $F >> output.txt ; done
Of course you can make it on one line.
I think also your output is not correct from your input file I get (there are not all the files in the input).
Upvotes: 0
Reputation: 111
Maybe you could use python to keep a mapping between filename and existing file paths like (key, [path1, path2 ...]). Then you could maybe sort the possible file paths and use only the first one.
import os.path
import fileinput
# dictonary to hold (key, [path1, path2 ...])
file_paths = {}
for line in fileinput.input("input.txt"):
value = line.strip('\n')
base = os.path.basename(line)
if base in file_paths:
# add another value to existing key
keys = file_paths[base]
keys.append(value)
file_paths[base] = keys
else:
# init another bucket
file_paths[base] = [value]
# print only firt value
for key in file_paths.keys():
sl = sorted(file_paths[key])
print sl[0]
Upvotes: 0