Linux: list unique part of file name

Question

I have about 50K files is a directory (linux OS) and they have naming convention as USER_ID.ORACLE_JOB_ID.SEQUENCED_NUMBER.pdf

I need to list all unique ORACLE_JOB_ID in a text file. How can this be done?

PS: Forgot to mention that there are some other files in same directory which have different naming convention and I have to avoid them.

Thanks!

Examples: 1.6778390.done 2.o6778390.out 3.AWRX_GBL_FAR1.98567432.4.dat.xml 4.AWRX_GBL_FAR1.34789214.4.pdf

qwwqwwq · Accepted Answer

ls | awk 'BEGIN{FS="."}{ print $2 }' | sort | uniq > file.txt

ls get list of all file names in current directory

awk split each file name by the Field Separator ".", print only the second field

sort sort this second field

uniq remove consecutive identical lines

EDIT: if you want to limit to just the files in the current dir with .pdf use:

find . -iname '*.pdf' | awk 'BEGIN{FS="."}{ print $3 }' | sort | uniq > file.txt

using ls *.pdf when there are many many pdfs in the current dir will overflow the arguments into ls, as the error shows, because its equivalent to calling ls with 50K different command line arguments, overflowing ARGV.

Linux: list unique part of file name

Answers (2)

Related Questions