Reputation: 57
I'm trying to extract the torrent name from torrent files.
Without looking to deep in how torrent files are structured I noticed that I only need to match last occurrence of all characters between two strings which in my case are :
* 12:piece lengthi
.
Here is the beginning of Arch Linux iso torrent file:
d8:announce42:http://tracker.archlinux.org:6969/announce7:comment41:Arch Linux 2015.07.01 (www.archlinux.org)10:created by13:mktorrent 1.013:creation datei1435770645e4:infod6:lengthi677380096e4:name29:archlinux-2015.07.01-dual.iso12:piece lengthi
I need to extract archlinux-2015.07.01-dual.iso
witch is in between :
and 12:piece lengthi
. I checked this pattern with other torrent files in my case it will work! I can't figure out how to combine the regex (?<=:)(.*)(?=12:piece lengthi)
and :(?:.(?!:))+$
if they are even correct at all.
I'm trying to make a bash script with grep
OR awk
OR sed
or something that could with a linux command.
Final perfectly working solution (thoroughly tested): This works with all types of non-standard characters for example Cyrillic.
torrent_title=$(tr -d "\n" < "$filename" | iconv -f utf-8 -t utf-8 -c | sed 's/.*:\(.*\)12:piece lengthi.*/\1/')
Update:All suggestion work but Torrent files are binary files for example I tried grep --text
and strings file |
piped to grep or sed but random strings from the binary file are messing up the output.
Update 2 and SOLVED IT: so the final command is this
head -1 file.torrent| strings | tr -d "\n\r" | iconv -f utf-8 -t utf-8 -c| sed 's/.*:\(.*\)12:piece lengthi.*/\1/
I figured that the info is only in the first line of the file. In my original example post I forgot to copy a couple of more strings at the end
d8:announce42:http://tracker.archlinux.org:6969/announce7:comment41:Arch Linux 2015.07.01 (www.archlinux.org)10:created by13:mktorrent 1.013:creation datei1435770645e4:infod6:lengthi677380096e4:name29:archlinux-2015.07.01-dual.iso12:piece lengthi524288e6:pieces25840:
witch are part of the first line so for that I needed to slightly change hek2mgl sed answer.
Update 3 The right way to do it is to use a parser, I learned it the hard way.
Upvotes: 2
Views: 657
Reputation: 158040
I would use sed
for that, like this:
sed 's/.*:\(.*\)12:piece lengthi/\1/' input.torrent
Upvotes: 2
Reputation: 88654
Try this with GNU grep:
grep -oP ':\K[^:]*(?=12:piece lengthi$)' file
Output:
archlinux-2015.07.01-dual.iso
Upvotes: 2