bosadjo
bosadjo

Reputation: 57

Regex match last occurrence of all characters between two strings

I'm trying to extract the torrent name from torrent files. Without looking to deep in how torrent files are structured I noticed that I only need to match last occurrence of all characters between two strings which in my case are : * 12:piece lengthi.

Here is the beginning of Arch Linux iso torrent file:

d8:announce42:http://tracker.archlinux.org:6969/announce7:comment41:Arch Linux 2015.07.01 (www.archlinux.org)10:created by13:mktorrent 1.013:creation datei1435770645e4:infod6:lengthi677380096e4:name29:archlinux-2015.07.01-dual.iso12:piece lengthi

I need to extract archlinux-2015.07.01-dual.iso witch is in between : and 12:piece lengthi. I checked this pattern with other torrent files in my case it will work! I can't figure out how to combine the regex (?<=:)(.*)(?=12:piece lengthi) and :(?:.(?!:))+$ if they are even correct at all.

I'm trying to make a bash script with grep OR awk OR sed or something that could with a linux command.

Final perfectly working solution (thoroughly tested): This works with all types of non-standard characters for example Cyrillic.

torrent_title=$(tr -d "\n" < "$filename" | iconv -f utf-8 -t utf-8 -c | sed 's/.*:\(.*\)12:piece lengthi.*/\1/')

Update:All suggestion work but Torrent files are binary files for example I tried grep --text and strings file | piped to grep or sed but random strings from the binary file are messing up the output.

Update 2 and SOLVED IT: so the final command is this

head -1 file.torrent| strings | tr -d "\n\r" | iconv -f utf-8 -t utf-8 -c| sed 's/.*:\(.*\)12:piece lengthi.*/\1/

I figured that the info is only in the first line of the file. In my original example post I forgot to copy a couple of more strings at the end

 d8:announce42:http://tracker.archlinux.org:6969/announce7:comment41:Arch Linux 2015.07.01 (www.archlinux.org)10:created by13:mktorrent 1.013:creation datei1435770645e4:infod6:lengthi677380096e4:name29:archlinux-2015.07.01-dual.iso12:piece lengthi524288e6:pieces25840:

witch are part of the first line so for that I needed to slightly change hek2mgl sed answer.

Update 3 The right way to do it is to use a parser, I learned it the hard way.

Upvotes: 2

Views: 657

Answers (3)

Mario Zannone
Mario Zannone

Reputation: 2883

Try this:

 sed -e 's/12:piece lengthi//' -e 's/.*://'

Upvotes: 2

hek2mgl
hek2mgl

Reputation: 158040

I would use sed for that, like this:

sed 's/.*:\(.*\)12:piece lengthi/\1/' input.torrent

Upvotes: 2

Cyrus
Cyrus

Reputation: 88654

Try this with GNU grep:

 grep -oP ':\K[^:]*(?=12:piece lengthi$)' file

Output:

archlinux-2015.07.01-dual.iso

Upvotes: 2

Related Questions