Reputation: 51
How could you grep or use sed or awk to parse for a dynamic length substring? Here are some examples:
I need to parse out everything except for the "XXXXX.WAV" in these strings, but the strings are not a set length.
Sometimes its like this:
{"filename": "/assets/JFM/imaging/19001.WAV"},
{"filename": "/assets/JFM/imaging/19307.WAV"},
{"filename": "/assets/JFM/imaging/19002.WAV"}
And sometimes like this:
{"filename": "/assets/JFM/LN_405999/101.WAV"},
{"filename": "/assets/JFM/LN_405999/102.WAV"},
{"filename": "/assets/JFM/LN_405999/103.WAV"}
Is there a great dynamic way to parse for just the .WAV? Maybe if I start at "/" and parse until "?
Edit:
Expected output like this:
19001.WAV
19307.WAV
19002.WAV
Or:
101.WAV
101.WAV
103.WAV
Upvotes: 1
Views: 215
Reputation: 1517
awk -F/ '{print substr($5,1,7)}' file
101.WAV
102.WAV
103.WAV
Upvotes: 0
Reputation: 114470
All of the programs you listed use regex to parse the names, so I will show you an example using grep
, being probably the most basic one for this case.
There are a couple of options, depending on the exact way you define the XXX part before the ".wav".
Option 1, as you pointed out is just the file name, i.e., everything after the last slash:
grep -hoi "[^/]\+\.WAV"
This reads as "any character besides slash" ([^/]
) repeated at least once (\+
), followed by a literal .WAV
(\.WAV
).
Option 2 would be to only grab the digits before the extension:
grep -hoi "[[:digit:]]\+\.WAV"
OR
grep -hoi "[0-9]\+\.WAV"
These read as "digits" ([[:digit:]]
and [0-9]
mean the same thing) repeated at least once (\+
), followed by a literal .WAV
(\.WAV
).
In all cases, I recommend using the flags -h
, -o
, -i
, which I have concatenated into a single option -hoi
. -h
suppresses the file name from the output. -o
makes grep
only output the portion that matches. -i
makes the match case insensitive, so should your extension ever change to .wav
instead of .WAV
, you'll be fine.
Also, in all cases, the input is up to you. You can pipe it in from another program, which will look like
program | grep -hoi "[^/]\+\.WAV"
You can get it from a file using stdin redirection:
grep -hoi "[^/]\+\.WAV" < somefile.txt
Or you can just pass the filename to grep
:
grep -hoi "[^/]\+\.WAV" somefile.txt
Upvotes: 1
Reputation: 3147
Try this -
awk -F'[{":}/]' '{print $(NF-2)}' f
19001.WAV
19307.WAV
19002.WAV
OR
egrep -o '[[:digit:]]{5}.WAV' f
19001.WAV
19307.WAV
19002.WAV
OR
egrep -o '[[:digit:]]{5}.[[:alpha:]]{3}' f
19001.WAV
19307.WAV
19002.WAV
You can easily change the value of digit and character as per your need for different example in egrep but awk will work fine for both case.
Upvotes: 1
Reputation: 14965
Just use grep
as proposed in comments:
grep -o '[^/]\{1,\}\.WAV' yourfile
If the wav file always contains numbers, this seems more explicit (same result):
grep -o '[0-9]\{1,\}\.WAV'
Upvotes: 3
Reputation: 67527
another awk
awk -F'[/"]' '{print $(NF-1)}' file
19001.WAV
19307.WAV
19002.WAV
Upvotes: 1
Reputation: 95325
Assuming there are [
and ]
lines at the beginning and end of your file, it looks like your input is JSON, in which case I would recommend installing and using jq
rather than text-based utilities, and doing something like this:
jq -r '.[]|.filename|split("/")[-1]'
But failing that, any of the tools listed will work just fine.
grep -o '[^/]*\.WAV'
or
sed -ne 's,.*/\([^/]*\.WAV\).*$,\1,p'
or
awk -F'"' '/WAV/ {split($4,a,"/"); print a[length(a)]}'
In each case there are a variety of other possible solutions as well.
Upvotes: 2
Reputation: 72717
Or with sed
$ sed 's,.*/,,; s,".*,,' x
101.WAV
102.WAV
103.WAV
Explanation:
s,.*/,,
- delete everything up to and including the rightmost /
s,".*,,
- delete everything starting with the leftmost "
to the end of the lineUpvotes: 1