Reputation: 43
I have a C# regular expression to match author names in a text document that is written as:
"author":"AUTHOR'S NAME"
The regex is as follows:
new Regex("\"author\":\"[A-Za-z0-9]*\\s?[A-Za-z0-9]*")
This returns "author":"AUTHOR'S NAME
. However, I don't want the quotation marks or the word Author
before. I just want the name.
Could anyone help me get the expected value please?
Upvotes: 0
Views: 607
Reputation: 626690
You can also use look-around approach to only get a match value:
var txt = "\"author\":\"AUTHOR'S NAME\"";
var rgx = new Regex(@"(?<=""author"":"")[^""]+(?="")");
var result = rgx.Match(txt).Value;
My regex yields 555,020 iterations per second speed with this input string, which should suffice.
result
will be AUTHOR'S NAME
.
(?<="author":")
checks if we have "author":"
before the match, [^"]+
looks safe since you only want to match alphanumerics and space between the quotes, and (?=")
is checking the trailing quote.
Upvotes: 0
Reputation: 2853
Use regex groups to get a part of the string. ( )
acts as a capture group and can be accessed by the .Groups
field.
.Groups[0]
matches the whole string
.Groups[1]
matches the first group (and so on)
string pattern = "\"author\":\"([A-Za-z0-9]*\\s?[A-Za-z0-9]*)\"";
var match = Regex.Match("\"author\":\"Name123\"", pattern);
string authorName = match.Groups[1];
Upvotes: 3