Greg
Greg

Reputation: 1842

How to specify a file prefix in gawk

I am trying to identify file extensions from a list of filenames extracted from a floppy disk image. The problem is different from this example where files are already extracted from the disk image. I'm new to gawk so maybe it is not the right tool.

ls Sounddsk2.img -a1 > allfilenames

The command above creates the list of filenames shown below.

flute.pt
flute.ss
flute.vc
guitar.pt
guitar.ss
guitar.vc

The gawk command below identifies files ending in .ss

cat allfilenames | gawk '/[fluteguitar].ss/' > ssfilenames

This would be fine when there are just a few known file names. How do I specify a file prefix in a more generic form?

Upvotes: 2

Views: 377

Answers (4)

αғsнιη
αғsнιη

Reputation: 2761

with the regex you come /[fluteguitar].ss/, this matches on lines having one of these characters in it f, l, u, e, g, i, t, a and r (specified within bracket expression [...],duplicated characters count only once) followed by any single character (except newline here) that a single un-escaped dot . matches, then double ss in any place of a line.

you need to restrict the matching by using the start ^ and end $ of line anchors, as well as using the group of match.

awk '/^(flute|guitar)\.ss$/' allFilesName> ssFileNames

to filter only two files names matched with flute.ss and/or guitar.ss. The group match (...|...) is matches on any one of regexpes separated by the pipe as saying logical OR.

if these are just prefixes and to match any files beginning with these characters and end with .ss, use:

awk '/^(flute|guitar).*\.ss$/' allFilesName> ssFileNames

Upvotes: 1

The fourth bird
The fourth bird

Reputation: 163207

You might also use grep with -E for extended regexp and use an alter alternation to match either flute or guitar.

ls Sounddsk2.img -a1 | grep -E "^(flute|guitar)\.ss$" > ssfilenames

The pattern matches:

  • ^ Start of string
  • (flute|guitar) Match either flute or guitar
  • \.ss Match .ss
  • $ End of string

The file ssfilenames contains:

flute.ss
guitar.ss

Upvotes: 1

RavinderSingh13
RavinderSingh13

Reputation: 133428

Please use find command to deal with matching of names of files, with your shown samples you could try following. You could run this command on directory itself and you need not to store file names into a file and then use awk for it.

find . -regextype egrep -regex '.*/(flute|guitar)\.ss$'

Explanation: Simple explanation would be, using find command's capability to add regextype in it(using egrep style here); where giving regex to match file names fulte OR guitar and make sure its ending with ss here.

Upvotes: 1

Greg
Greg

Reputation: 1842

Unless someone can suggest a better one this seems to be the most generic way to express this. It will work for any prefix filename spelt with uppercase letters, lowercase letters and numbers

cat allfilenames | gawk '/[a-zA-Z0-9].ss/' > ssfilenames

Edit

αғsнιη's first suggested answer and jetchisel's comment prompted me to try using gawk without using cat.

gawk '/^([a-zA-Z0-9])\.ss$/' allfilenames > ssfilenames

and this also worked

gawk '/[a-zA-Z0-9]\.ss/' allfilenames > ssfilenames

Upvotes: 1

Related Questions