Reputation: 1101
I am trying to find the best regular expression pattern to extract a sub string from a string.
The string is of the type,
0816606366.Univ.of.Minnesota.Pr.Minnesota.Messenia.Expedition.Reconstructing.a.Bronze.Age.Regional.Environment.Jun.1972.pdf
I would like to create a regex that would give me everything after the first period. So in this case, the required sub string would be,
Univ.of.Minnesota.Pr.Minnesota.Messenia.Expedition.Reconstructing.a.Bronze.Age.Regional.Environment.Jun.1972.pdf
I tried
\w+
\w*
[\w]*
and everything else in between but Im just not able to get the result I want. Could someone please point me in the right direction?
Thank you
edit: My apologies. I forgot to mention the programming language I was using. I am using Python and the re module that it comes with.
Upvotes: 0
Views: 87
Reputation: 14685
There are many ways to do this as you can see above. The way I prefer is:
^[^.]*\.(.*)$
You can test all sorts of methods out on the fly here:
Upvotes: 0
Reputation: 391
You should certainly read the manual first before posting a question this specific. If you have a Unix-like environment with the Perl documentation installed, this should be your first stop:
perldoc perlre
Alternatively, you can read the documentation online
perl -e '"ab.cd.ef.gh" =~ m/[^.]+.(.+)/; print $1'
[.] # Use the square bracket to match a given set of characters.
[^.] # Use the caret symbol to invert the matching set.
[^.]+ # The plus symbol matches one or more of the previous symbol.
\. # The escaping backslash and period matches a literal period character
() # Use parenthesis to capture a submatch
(.+) # Use the period to match any one character and the plus
Here's a great tool for building regular expressions:
http://txt2re.com/
Upvotes: 0
Reputation: 168655
Simple regex to separate the first part from the rest:
/^.+?\.(.+)$/
Then just grab the content of capturing group 1.
To explain it:
^
and $
match the start end end the string.
.+?
is a non-greedy match for any number of any character (non-greedy (denoted by the question mark) because otherwise it would match the whole string; this way it stops at the dot to allow the rest of the expression to match)
\.
is a dot character, which is our delimiter.
(.+)
another any number of any characters match; this time it's greedy because we don't mind; there's nothing after it anyway. Wrapped in brackets to make it into a capturing group, so we can extract it from the regex engine.
You haven't specified the language you're working in, but a generic bit of code could look something like this:
var output = input.replace(/^.+?\.(.+)$/,"$1");
Hope that helps.
Upvotes: 4
Reputation: 3395
^[^\.]+\.(.+)$
Upvotes: 2
Reputation: 256581
\d+\.(.+)
and replacement is
$1
Documentation is:
\d
match a digit\d+
match more than one digit\d+\.
followed by a "."\d+\..+
followed by anything\d+\.(.+)
capture the "anything" chunki tested it at RegEx Planet:
Regular Expression: \d+\.(.+)
Replacement: $1
Test String#1: 0816606366.Univ.of.Minnesota.Pr.Minnesota.Messenia.Expedition.Reconstructing.a.Bronze.Age.Regional.Environment.Jun.1972.pdf
Result: Univ.of.Minnesota.Pr.Minnesota.Messenia.Expedition.Reconstructing.a.Bronze.Age.Regional.Environment.Jun.1972.pdf
Upvotes: 1