tapan
tapan

Reputation: 1796

Extract lines between 2 tokens in a text file using bash

i have a text file which looks like this:

random useless text 
<!-- this is token 1 --> 
para1 
para2 
para3 
<!-- this is token 2 --> 
random useless text again

I want to extract the text in between the tokens (excluding the tokens of course). I tried using ## and %% to extract the data in between but it didn't work. I think it is not meant for manipulating such large text files. Any suggestions how i can do it ? maybe awk or sed ?

Upvotes: 22

Views: 28503

Answers (7)

Kelly Beard
Kelly Beard

Reputation: 702

sed -n "/TOKEN1/,/TOKEN2/p" <YOUR INPUT FILE> | sed -e '/TOKEN1/d' -e '/TOKEN2/d'

Upvotes: 0

realex
realex

Reputation: 11

no need to call mighty sed / awk / perl. You could do it "bash-only":

#!/bin/bash
STARTFLAG="false"
while read LINE; do
    if [ "$STARTFLAG" == "true" ]; then
            if [ "$LINE" == '<!-- this is token 2 -->' ];then
                    exit
            else
                    echo "$LINE"
            fi
    elif [ "$LINE" == '<!-- this is token 1 -->' ]; then
            STARTFLAG="true"
            continue
    fi
done < t.txt

Kind regards

realex

Upvotes: 1

Dennis Williamson
Dennis Williamson

Reputation: 360325

No need for head and tail or grep or to read the file multiple times:

sed -n '/<!-- this is token 1 -->/{:a;n;/<!-- this is token 2 -->/b;p;ba}' inputfile

Explanation:

  • -n - don't do an implicit print
  • /<!-- this is token 1 -->/{ - if the starting marker is found, then
    • :a - label "a"
      • n - read the next line
      • /<!-- this is token 2 -->/q - if it's the ending marker, quit
      • p - otherwise, print the line
    • ba - branch to label "a"
  • } end if

Upvotes: 42

CaptainChristo
CaptainChristo

Reputation: 101

Maybe sed and awk have more elegant solutions, but I have a "poor man's" approach with grep, cut, head, and tail.

#!/bin/bash

dataFile="/path/to/some/data.txt"
startToken="token 1"
stopToken="token 2"

startTokenLine=$( grep -n "${startToken}" "${dataFile}" | cut -f 1 -d':' )
stopTokenLine=$( grep -n "${stopToken}" "${dataFile}" | cut -f 1 -d':' )

let stopTokenLine=stopTokenLine-1
let tailLines=stopTokenLine-startTokenLine

head -n ${stopTokenLine} ${dataFile} | tail -n ${tailLines}

Upvotes: 2

Peter Taylor
Peter Taylor

Reputation: 5066

You can extract it, including the tokens with sed. Then use head and tail to strip the tokens off.

... | sed -n "/this is token 1/,/this is token 2/p" | head -n-1 | tail -n+2

Upvotes: 26

aioobe
aioobe

Reputation: 421090

Try the following:

sed -n '/<!-- this is token 1 -->/,/<!-- this is token 2 -->/p' your_input_file
        | egrep -v '<!-- this is token . -->'

Upvotes: 1

Brian Agnew
Brian Agnew

Reputation: 272337

For anything like this, I'd reach for Perl, with its combination of (amongst others) sed and awk capabilities. Something like (beware - untested):

my $recording = 0;
my @results = ();
while (<STDIN>) {
   chomp;
   if (/token 1/) {
      $recording = 1;
   }
   else if (/token 2/) {
      $recording = 0;
   }
   else if ($recording) {
      push @results, $_;
   }
}

Upvotes: 0

Related Questions