Reputation: 51

Grep/Sed/Awk Options

How could you grep or use sed or awk to parse for a dynamic length substring? Here are some examples:

I need to parse out everything except for the "XXXXX.WAV" in these strings, but the strings are not a set length.

Sometimes its like this:

{"filename": "/assets/JFM/imaging/19001.WAV"},
{"filename": "/assets/JFM/imaging/19307.WAV"},
{"filename": "/assets/JFM/imaging/19002.WAV"}

And sometimes like this:

 {"filename": "/assets/JFM/LN_405999/101.WAV"},
 {"filename": "/assets/JFM/LN_405999/102.WAV"},
 {"filename": "/assets/JFM/LN_405999/103.WAV"}

Is there a great dynamic way to parse for just the .WAV? Maybe if I start at "/" and parse until "?

Edit:

Expected output like this:

19001.WAV
19307.WAV
19002.WAV

Or:

101.WAV
101.WAV
103.WAV

Upvotes: 1

Answers (7)

Claes Wikner

Reputation: 1517

awk -F/ '{print substr($5,1,7)}' file

101.WAV
102.WAV
103.WAV

Upvotes: 0

Mad Physicist

Reputation: 114470

All of the programs you listed use regex to parse the names, so I will show you an example using grep, being probably the most basic one for this case.

There are a couple of options, depending on the exact way you define the XXX part before the ".wav".

Option 1, as you pointed out is just the file name, i.e., everything after the last slash:

grep -hoi "[^/]\+\.WAV"

This reads as "any character besides slash" ([^/]) repeated at least once (\+), followed by a literal .WAV (\.WAV).

Option 2 would be to only grab the digits before the extension:

grep -hoi "[[:digit:]]\+\.WAV"

grep -hoi "[0-9]\+\.WAV"

These read as "digits" ([[:digit:]] and [0-9] mean the same thing) repeated at least once (\+), followed by a literal .WAV (\.WAV).

In all cases, I recommend using the flags -h, -o, -i, which I have concatenated into a single option -hoi. -h suppresses the file name from the output. -o makes grep only output the portion that matches. -i makes the match case insensitive, so should your extension ever change to .wav instead of .WAV, you'll be fine.

Also, in all cases, the input is up to you. You can pipe it in from another program, which will look like

program | grep -hoi "[^/]\+\.WAV"

You can get it from a file using stdin redirection:

grep -hoi "[^/]\+\.WAV" < somefile.txt

Or you can just pass the filename to grep:

grep -hoi "[^/]\+\.WAV" somefile.txt

Upvotes: 1

VIPIN KUMAR

Reputation: 3147

Try this -

awk  -F'[{":}/]' '{print $(NF-2)}' f
19001.WAV
19307.WAV
19002.WAV

egrep -o '[[:digit:]]{5}.WAV' f
19001.WAV
19307.WAV
19002.WAV

egrep -o '[[:digit:]]{5}.[[:alpha:]]{3}' f
19001.WAV
19307.WAV
19002.WAV

You can easily change the value of digit and character as per your need for different example in egrep but awk will work fine for both case.

Upvotes: 1

Juan Diego Godoy Robles

Reputation: 14965

Just use grep as proposed in comments:

grep -o '[^/]\{1,\}\.WAV' yourfile

If the wav file always contains numbers, this seems more explicit (same result):

grep -o '[0-9]\{1,\}\.WAV'

Upvotes: 3

karakfa

Reputation: 67527

another awk

awk -F'[/"]' '{print $(NF-1)}' file

19001.WAV
19307.WAV
19002.WAV

Upvotes: 1

Mark Reed

Reputation: 95325

Assuming there are [ and ] lines at the beginning and end of your file, it looks like your input is JSON, in which case I would recommend installing and using jq rather than text-based utilities, and doing something like this:

jq -r '.[]|.filename|split("/")[-1]'

But failing that, any of the tools listed will work just fine.

grep -o '[^/]*\.WAV'

sed -ne 's,.*/\([^/]*\.WAV\).*$,\1,p'

awk -F'"' '/WAV/ {split($4,a,"/"); print a[length(a)]}'

In each case there are a variety of other possible solutions as well.

Upvotes: 2

Jens

Reputation: 72717

Or with sed

$ sed 's,.*/,,; s,".*,,' x
101.WAV
102.WAV
103.WAV

Explanation:

s,.*/,, - delete everything up to and including the rightmost /
s,".*,, - delete everything starting with the leftmost " to the end of the line

Upvotes: 1

Grep/Sed/Awk Options

Answers (7)

Related Questions