Reputation: 1842
I am trying to identify file extensions from a list of filenames extracted from a floppy disk image. The problem is different from this example where files are already extracted from the disk image. I'm new to gawk
so maybe it is not the right tool.
ls Sounddsk2.img -a1 > allfilenames
The command above creates the list of filenames shown below.
flute.pt
flute.ss
flute.vc
guitar.pt
guitar.ss
guitar.vc
The gawk
command below identifies files ending in .ss
cat allfilenames | gawk '/[fluteguitar].ss/' > ssfilenames
This would be fine when there are just a few known file names. How do I specify a file prefix in a more generic form?
Upvotes: 2
Views: 377
Reputation: 2761
with the regex you come /[fluteguitar].ss/
, this matches on lines having one of these characters in it f
, l
, u
, e
, g
, i
, t
, a
and r
(specified within bracket expression [...]
,duplicated characters count only once) followed by any single character (except newline here) that a single un-escaped dot .
matches, then double ss
in any place of a line.
you need to restrict the matching by using the start ^
and end $
of line anchors, as well as using the group of match.
awk '/^(flute|guitar)\.ss$/' allFilesName> ssFileNames
to filter only two files names matched with flute.ss
and/or guitar.ss
. The group match (...|...)
is matches on any one of regexpes separated by the pipe as saying logical OR.
if these are just prefixes and to match any files beginning with these characters and end with .ss
, use:
awk '/^(flute|guitar).*\.ss$/' allFilesName> ssFileNames
Upvotes: 1
Reputation: 163207
You might also use grep with -E
for extended regexp and use an alter alternation to match either flute or guitar.
ls Sounddsk2.img -a1 | grep -E "^(flute|guitar)\.ss$" > ssfilenames
The pattern matches:
^
Start of string(flute|guitar)
Match either flute or guitar\.ss
Match .ss
$
End of stringThe file ssfilenames contains:
flute.ss
guitar.ss
Upvotes: 1
Reputation: 133428
Please use find
command to deal with matching of names of files, with your shown samples you could try following. You could run this command on directory itself and you need not to store file names into a file and then use awk
for it.
find . -regextype egrep -regex '.*/(flute|guitar)\.ss$'
Explanation: Simple explanation would be, using find
command's capability to add regextype in it(using egrep
style here); where giving regex to match file names fulte
OR guitar
and make sure its ending with ss here.
Upvotes: 1
Reputation: 1842
Unless someone can suggest a better one this seems to be the most generic way to express this. It will work for any prefix filename spelt with uppercase letters, lowercase letters and numbers
cat allfilenames | gawk '/[a-zA-Z0-9].ss/' > ssfilenames
Edit
αғsнιη's first suggested answer and jetchisel's comment prompted me to try using gawk
without using cat
.
gawk '/^([a-zA-Z0-9])\.ss$/' allfilenames > ssfilenames
and this also worked
gawk '/[a-zA-Z0-9]\.ss/' allfilenames > ssfilenames
Upvotes: 1