Nitin
Nitin

Reputation: 65

Linux: list unique part of file name

I have about 50K files is a directory (linux OS) and they have naming convention as USER_ID.ORACLE_JOB_ID.SEQUENCED_NUMBER.pdf

I need to list all unique ORACLE_JOB_ID in a text file. How can this be done?

PS: Forgot to mention that there are some other files in same directory which have different naming convention and I have to avoid them.

Thanks!

Examples: 1.6778390.done 2.o6778390.out 3.AWRX_GBL_FAR1.98567432.4.dat.xml 4.AWRX_GBL_FAR1.34789214.4.pdf

Upvotes: 3

Views: 7724

Answers (2)

zwol
zwol

Reputation: 140629

In the spirit of "there's more than one way to do it," here is a perl one-liner which is functionally equivalent to qwwqwwq's shell pipeline:

perl -le 'my %seen; print for sort grep !$seen{$_}++, map { (split /\./)[1] } <*>'

<*> can be replaced with any glob expression, e.g. <*.pdf> to operate only on files whose names end with .pdf.

Upvotes: 2

qwwqwwq
qwwqwwq

Reputation: 7309

ls | awk 'BEGIN{FS="."}{ print $2 }' | sort | uniq > file.txt

ls get list of all file names in current directory

awk split each file name by the Field Separator ".", print only the second field

sort sort this second field

uniq remove consecutive identical lines

EDIT: if you want to limit to just the files in the current dir with .pdf use:

find . -iname '*.pdf' | awk 'BEGIN{FS="."}{ print $3 }' | sort | uniq > file.txt

using ls *.pdf when there are many many pdfs in the current dir will overflow the arguments into ls, as the error shows, because its equivalent to calling ls with 50K different command line arguments, overflowing ARGV.

Upvotes: 8

Related Questions