user1899415
user1899415

Reputation: 3125

Regex to extract only text after string and before space

I want to match text after given string. In this case, the text for lines starting with "BookTitle" but before first space:

BookTitle:HarryPotter JK Rowling
BookTitle:HungerGames Suzanne Collins
Author:StephenieMeyer BookTitle:Twilight

Desired output is:

HarryPotter
HungerGames

I tried: "^BookTitle(.*)" but it's giving me matches where BookTitle: is in middle of line, and also all the stuff after white space. Anyone help?

Upvotes: 34

Views: 138261

Answers (3)

John Woo
John Woo

Reputation: 263683

you can have positive lookbehind in your pattern.

 (?<=BookTitle:).*?(?=\s)

For more info: Lookahead and Lookbehind Zero-Width Assertions

Upvotes: 49

user557597
user557597

Reputation:

With the 'multi-line' regex option use something like this:

 ^BookTitle:([^\s]+)  

Without multi-line option, this:

 (?:^|\n)BookTitle:([^\s]+)

Upvotes: 5

Edward
Edward

Reputation: 1000

What language is this?
And provide some code, please; with the ^ anchor you should definitely only be matching on string that begin with BookTitle, so something else is wrong.
If you can guarantee that all whitespace is stripped from the titles, as in your examples, then ^BookTitle:(\S+) should work in many languages.
Explanation:
^ requires the match to start at the beginning of the string, as you know.
\s - *lower*case means: match on white*s*pace (space, tab, etc.)
\S - *upper*case means the inverse: match on anything BUT whitespace.
\w is another possibility: match on *w*ord character (alphanumeric plus underscore) - but that will fail you if, for example, there's an apostrophe in the title.
+, as you know, is a quantifier meaning "at least one of".
Hope that helps.

Upvotes: 8

Related Questions