Reputation: 65
I have about 50K files is a directory (linux OS) and they have naming convention as USER_ID.ORACLE_JOB_ID.SEQUENCED_NUMBER.pdf
I need to list all unique ORACLE_JOB_ID in a text file. How can this be done?
PS: Forgot to mention that there are some other files in same directory which have different naming convention and I have to avoid them.
Thanks!
Examples: 1.6778390.done 2.o6778390.out 3.AWRX_GBL_FAR1.98567432.4.dat.xml 4.AWRX_GBL_FAR1.34789214.4.pdf
Upvotes: 3
Views: 7724
Reputation: 140629
In the spirit of "there's more than one way to do it," here is a perl one-liner which is functionally equivalent to qwwqwwq's shell pipeline:
perl -le 'my %seen; print for sort grep !$seen{$_}++, map { (split /\./)[1] } <*>'
<*>
can be replaced with any glob expression, e.g. <*.pdf>
to operate only on files whose names end with .pdf
.
Upvotes: 2
Reputation: 7309
ls | awk 'BEGIN{FS="."}{ print $2 }' | sort | uniq > file.txt
ls
get list of all file names in current directory
awk
split each file name by the Field Separator ".", print only the second field
sort
sort this second field
uniq
remove consecutive identical lines
EDIT: if you want to limit to just the files in the current dir with .pdf use:
find . -iname '*.pdf' | awk 'BEGIN{FS="."}{ print $3 }' | sort | uniq > file.txt
using ls *.pdf
when there are many many pdfs in the current dir will overflow the arguments into ls
, as the error shows, because its equivalent to calling ls
with 50K different command line arguments, overflowing ARGV
.
Upvotes: 8