Reputation: 854
I have a text file that contains something like this (this is only on excerpt):
Third Doctor
Season 7
051 Spearhead from Space 4 3—24 January 1970
052 Doctor Who and the Silurians 7 31 January—14 March 1970
053 The Ambassadors of Death 7 21 March—2 May 1970
054 Inferno 7 9 May—20 June 1970
Season 8
055 Terror of the Autons 4 2—23 January 1971
056 The Mind of Evil 6 30 January—6 March 1971
057 The Claws of Axos 4 13 March—3 April 1971
058 Colony in Space 6 10 April—15 May 1971
059 The Dæmons 5 22 May—19 June 1971
Note that the basic line pattern is ^###\t.*\t?\t.*$
(i.e almost every line has 3 tabs \t
).
I would like to remove everything after the the episode title, so that it would look like this:
Third Doctor
Season 7
051 Spearhead from Space
052 Doctor Who and the Silurians
053 The Ambassadors of Death
054 Inferno
Season 8
055 Terror of the Autons
056 The Mind of Evil
057 The Claws of Axos
058 Colony in Space
059 The Dæmons
Currently I tested the following patterns in gedit:
([^\t]*)$ # replaces not only everything after the last `\t',
# incl that `\t', but also lines that *does not* contain any `\t'
Then I tried to ‘make a selection’ of the lines, which should be regexed by (?=(?=^(?:(?!Season).)*$)(?=^(?:(?!Series).)*$)(?=^(?:(?!Doctor$).)*$)(?=^(?:(?!Title).)*$)(?=^(?:(?!Specials$).)*$)(?=^(?:(?!Mini).)*$)(?=^(?:(?!^\t).)*$)(?=^(?:(?!Anim).)*$)).*$
— this does work as intended, BUT I do not know how to combine it with ([^\t]*)$
.
Upvotes: 0
Views: 78
Reputation: 89557
Since it is fields separated by tabs, you only need to use cut
to obtain the two first fields:
cut -f1,2 drwho.txt
for the knowledge, the same with awk:
awk -F"\t" '$3{print $1"\t"$2}!$3{print $0}' drwho.txt
explanation: awk works line by line, the F parameter defines the fields delimiter.
$3 { # if field3 exists
print $1"\t"$2 # display field1, a tab, field2
}
!$3 { # if field3 doesn't exist
print $0 # display the whole record (the line)
}
Upvotes: 1
Reputation: 67968
^(\d{3}\s+.*?)(?=\s*\d).*$
Try this.Replace by $1
.Use flags m
or MULTILINE
depending on your flavour of regex.See demo.
http://regex101.com/r/jI8lV7/8
Upvotes: 0