tukusejssirs
tukusejssirs

Reputation: 854

Merge 2 regex patterns

I have a text file that contains something like this (this is only on excerpt):

Third Doctor
Season 7
051 Spearhead from Space    4   3—24 January 1970
052 Doctor Who and the Silurians    7   31 January—14 March 1970
053 The Ambassadors of Death    7   21 March—2 May 1970
054 Inferno 7   9 May—20 June 1970

Season 8
055 Terror of the Autons    4   2—23 January 1971
056 The Mind of Evil    6   30 January—6 March 1971
057 The Claws of Axos   4   13 March—3 April 1971
058 Colony in Space 6   10 April—15 May 1971
059 The Dæmons  5   22 May—19 June 1971

Note that the basic line pattern is ^###\t.*\t?\t.*$ (i.e almost every line has 3 tabs \t).

I would like to remove everything after the the episode title, so that it would look like this:

Third Doctor
Season 7
051 Spearhead from Space
052 Doctor Who and the Silurians
053 The Ambassadors of Death
054 Inferno

Season 8
055 Terror of the Autons
056 The Mind of Evil
057 The Claws of Axos
058 Colony in Space
059 The Dæmons

Currently I tested the following patterns in gedit:

([^\t]*)$   # replaces not only everything after the last `\t',
            # incl that `\t', but also lines that *does not* contain any `\t'

Then I tried to ‘make a selection’ of the lines, which should be regexed by (?=(?=^(?:(?!Season).)*$)(?=^(?:(?!Series).)*$)(?=^(?:(?!Doctor$).)*$)(?=^(?:(?!Title).)*$)(?=^(?:(?!Specials$).)*$)(?=^(?:(?!Mini).)*$)(?=^(?:(?!^\t).)*$)(?=^(?:(?!Anim).)*$)).*$ — this does work as intended, BUT I do not know how to combine it with ([^\t]*)$.

Upvotes: 0

Views: 78

Answers (2)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89557

Since it is fields separated by tabs, you only need to use cut to obtain the two first fields:

cut -f1,2 drwho.txt

for the knowledge, the same with awk:

awk -F"\t" '$3{print $1"\t"$2}!$3{print $0}' drwho.txt

explanation: awk works line by line, the F parameter defines the fields delimiter.

$3 {                   # if field3 exists
    print $1"\t"$2     # display field1, a tab, field2
}
!$3 {                  # if field3 doesn't exist
    print $0           # display the whole record (the line)
}

Upvotes: 1

vks
vks

Reputation: 67968

^(\d{3}\s+.*?)(?=\s*\d).*$

Try this.Replace by $1.Use flags m or MULTILINE depending on your flavour of regex.See demo.

http://regex101.com/r/jI8lV7/8

Upvotes: 0

Related Questions