Reputation: 53
I'm looking to select the last part of of the string, namely the (current address xxxxx)
Here's the data:
Full Name: John Smith
May Go By: John C Smith
Johnathon Smith
Johnny Smith
Johnny C Smith
Age: 45
Current Address:
1234 SE 2nd st
Los Angeles, CA 12345
Now out of this data I extracted, I literally just want the
Current Address:
1234 SE 2nd st
Los Angeles, CA 1234
But since the address changes with every page I scrape, I want to make sure it just scrapes from the CURRENT ADDRESS: to the end of the string via Regex.
So far I've got
\w{7}\s\w{7}\s
as a regex but it just selects the Current Address part of the string and I can't figure out what to enter to have it finish off the rest of the string.
edit: I do want to keep the regex code that keeps in the current address part of the string since it's static and the only thing that changes from page to page is the address so I want to make sure whatever regex code will just continue until the string ends.
Thanks
Upvotes: 1
Views: 438
Reputation: 6958
(^Current Address:.+)
with Dot Matches Newline Mode enabled
This formatted version of that regex that has the multi-line/dot-matches options added but will not be compatible for all flavors of regex, but it will for quite a few: (?mis)(^Current Address:.+)
If you decide you don't want to keep the Current Address text, you can do:
^Current Address:[ ]\r\n(^.+$)+|^Current Address:[ ]\n(^.+$)+
and only retain capture-group 1.
Since .NET was specified in the comments, here is a sample code-snippet generated by RegexBuddy for C# to create an object with all regex matches in a string:
MatchCollection allMatchResults = null;
try {
Regex regexObj = new Regex("(^Current Address:.+)", RegexOptions.Singleline | RegexOptions.Multiline);
allMatchResults = regexObj.Matches(subjectString);
if (allMatchResults.Count > 0) {
// Access individual matches using allMatchResults.Item[]
} else {
// Match attempt failed
}
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
Upvotes: 2