Reputation: 699
I am trying to extract track information from MKV
files using mkvinfo
from a bash
script. The output is a long series of lines with repeating patterns as delimiters for various track properties of various track types. An example of a track is:
…
| + A track
| + Track number: 6 (track ID for mkvmerge & mkvextract: 5)
| + Track UID: 11555278830806058806
| + Track type: subtitles
| + (Unknown element: TrickTrackFlag; ID: 0xc6 size: 3)
| + Enabled: 1
| + Default flag: 0
| + Forced flag: 0
| + Lacing flag: 0
| + MinCache: 0
| + Timecode scale: 1
| + Name: Spanish
| + Language: spa
| + Codec ID: S_TEXT/UTF8
| + (Unknown element: TrackAttachmentLink; ID: 0x7446 size: 11)
| + Codec decode all: 1
| + A track
| + Track number: 7 (track ID for mkvmerge & mkvextract: 6)
…
There can be multiple instances of a given track type and the number of lines for a track is somewhat variable. I need to extract certain track properties from specific track types. For example, if I want to find all instances of the subtitles
track type and extract the Track number
and the Codec ID
, I can pipe the results through grep:
mkvinfo "file.mkv" | grep "subtitles" -B 2 | grep "Track number"
This outputs the lines containing the track numbers for all subtitle tracks. I have to put the lines into an array and filter them to get the first number so I can use it with mkvpropedit
, which requires the first number.
Similarly:
mkvinfo "file.mkv" | grep "subtitles" -A 10 | grep "Codec ID: " | sed 's/^.**: //'
outputs the codec IDs for all subtitle tracks.
This works fine IF I know exactly how many lines there are before/after the line containing subtitles
. The problem is, the exact number of lines to include varies from file to file. So what I need to do is to output the entire block of lines between | + A track
and a line beginning with |+
OR | +
OR EOF
. I also need to filter the block to extract the first Track number
and the Codec ID
. I tried using | grep -Eo [0-9]+ | head -1
to extract the first number of each track but it only works on the first track found and quits. If there's a way to make it work for all tracks in one line that would be helpful. The second example I gave using sed
works for the Codec ID
.
The bottom line QUESTION is:
How can I extract specific properties of specific track types, such as the example given, and put them into an array or arrays for further processing?
I am hoping to be able to meet the following criteria:
bash
(GNU bash, version 4.3.30(1)-release (x86_64-apple-darwin12.5.0)) utilities like sed
, awk
, grep
, …mkvinfo
into the various utilitiesI found lots of threads that show how to use sed
to find a block of text between two words but I could not get the code to work with entire lines or strings containing spaces. Maybe there is a way to do that but I don't know enough about sed
to be able to adapt the code to my situation.
Please explain in detail how your code works so I can 'learn how to fish' so next time I can do it myself.
Upvotes: 0
Views: 1452
Reputation: 18940
When processing multiple lines in complex ways, my tool of choice is awk
.
In each matching pattern, we save the match in a variable.
Finally, when we encounter the string indicating a new block (| + A track
), or we reach the end of the stream, we print the value of the variables we are interested in (track number, codec id), but only if the type is subtitles.
mkvinfo ... | gawk '
match($0, /Track number: ([0-9]+)/, m) {TN=m[1]}
match($0, /Codec ID: (.*)$/, m) {CI=m[1]}
/Track type: subtitles/ {SUB=1}
/^\| \+ A track$/ {if(SUB) print TN, CI; unset SUB}
END {if(SUB) print TN, CI; unset SUB}'
You need gawk
to have the match function to capture parenthesized groups.
Upvotes: 2