dma_k
dma_k

Reputation: 10639

sed script to remove file name duplicates

I hope the below task will be very easy for sed lovers. I am not sed-guru, but I need to express the following task in sed, as sed is more popular on Linux systems.

The input text stream is something which is produced by "make depends" and looks like following:

pgm2asc.o: pgm2asc.c ../include/config.h amiga.h list.h pgm2asc.h pnm.h \
 output.h gocr.h unicode.h ocr1.h ocr0.h otsu.h barcode.h progress.h
box.o: box.c gocr.h pnm.h ../include/config.h unicode.h list.h pgm2asc.h \
 output.h
database.o: database.c gocr.h pnm.h ../include/config.h unicode.h list.h \
 pgm2asc.h output.h
detect.o: detect.c pgm2asc.h pnm.h ../include/config.h output.h gocr.h \
 unicode.h list.h

I need to catch only C++ header files (i.e. ending with .h), make the list unique and print as space-separated list prepending src/ as a path-prefix. This is achieved by the following perl script:

make libs-depends | perl -e 'while (<>) { while (/ ([\w\.\/]+?\.h)/g) { $a{$1} = 1; } } print join " ", map { "src/$_" } keys %a;'

The output is:

src/unicode.h src/pnm.h src/progress.h src/amiga.h src/ocr0.h src/ocr1.h src/otsu.h src/barcode.h src/gocr.h src/../include/config.h src/list.h src/pgm2asc.h src/output.h

Please, help to express this in sed.

Upvotes: 0

Views: 367

Answers (3)

Beta
Beta

Reputation: 99144

If you really want to do this in pure sed:

make libs-depends | sed 's/ /\n/g' | sed '/\.h$/!d;s/^/src\//' | sed 'G;/^\(.*\)\n.*\1/!h;$!d;${x;s/\n/ /g}'

The first sed command breaks the output up into separate lines, the second filters out everything but *.h and prepends 'src/', the third gloms the lines together without repetition.

Upvotes: 1

Anton
Anton

Reputation: 2693

Not sed but hope this helps you:

make libs-depends | grep -io --perl-regexp "[\w\.\/]+\.h " | sort -u | sed -e 's:^:src/:' 

Upvotes: 2

pdbartlett
pdbartlett

Reputation: 1519

Sed probably isn't the best tool here as it's stream-oriented. You could possibly use it to convert the spaces to newlines though, pipe that through sort and uniq, then use sed again to convert the newlines back to spaces.

Typing this on my phone, though, so can't give exact commands :(

Upvotes: 0

Related Questions