Hawkeye Roe
Hawkeye Roe

Reputation: 43

Using Regex to extract part of a string from a HTML/text file

I have a C# regular expression to match author names in a text document that is written as:

"author":"AUTHOR'S NAME"

The regex is as follows:

new Regex("\"author\":\"[A-Za-z0-9]*\\s?[A-Za-z0-9]*")

This returns "author":"AUTHOR'S NAME. However, I don't want the quotation marks or the word Author before. I just want the name.

Could anyone help me get the expected value please?

Upvotes: 0

Views: 607

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626690

You can also use look-around approach to only get a match value:

var txt = "\"author\":\"AUTHOR'S NAME\"";
var rgx = new Regex(@"(?<=""author"":"")[^""]+(?="")");
var result = rgx.Match(txt).Value;

My regex yields 555,020 iterations per second speed with this input string, which should suffice.

result will be AUTHOR'S NAME.

(?<="author":") checks if we have "author":" before the match, [^"]+ looks safe since you only want to match alphanumerics and space between the quotes, and (?=") is checking the trailing quote.

Upvotes: 0

davidgiga1993
davidgiga1993

Reputation: 2853

Use regex groups to get a part of the string. ( ) acts as a capture group and can be accessed by the .Groups field.

.Groups[0] matches the whole string

.Groups[1] matches the first group (and so on)

string pattern = "\"author\":\"([A-Za-z0-9]*\\s?[A-Za-z0-9]*)\"";
var match = Regex.Match("\"author\":\"Name123\"", pattern);
string authorName = match.Groups[1];

Upvotes: 3

Related Questions