Reputation: 631
For the following bit of example code, I'm only interested in the last three bits separated by backslashes (e.g., Family/Genus/Species Name).
So for:
Magnoliopsida/Dilleniidae/Malvales/Malvaceae/Abutilon/Abutilon_theophrasti
Magnoliopsida/Rosidae/Euphorbiales/Euphorbiaceae/Acalypha/Acalypha_rhomboidea
Magnoliopsida/Rosidae/Sapindales/Aceraceae/Acer/Acer_negundo
Magnoliopsida/Rosidae/Sapindales/Aceraceae/Acer/Acer_nigrum
I want:
Malvaceae/Abutilon/Abutilon_theophrasti
Euphorbiaceae/Acalypha/Acalypha_rhomboidea
Aceraceae/Acer/Acer_negundo
Aceraceae/Acer/Acer_nigrum
How do I got about accomplishing this with regex?
Edit: I'm using Notepad++'s Replace functionality with Regular expressions. I'm able to "Find" what I want to replace with ^[^/]+/[^/]+/[^/]+[^/]/ But when I replace it with nothing, it does something weird. Any suggestions?
Upvotes: 1
Views: 67
Reputation: 67968
In Python
import re
x="Magnoliopsida/Rosidae/Sapindales/Aceraceae/Acer/Acer_nigrum"
pattern=re.compile(r"\w+\/\w+\/\w+\/(\S+)")
y=pattern.match(x).groups()
print y
Output is ('Aceraceae/Acer/Acer_nigrum',)
Upvotes: 0
Reputation: 631
Okay, figured it out...
I can search: \n^[^/]+/[^/]+/[^/]+[^/]/
and replace with: \n
to get more-or-less what I want.
Thanks all!
Upvotes: 0
Reputation: 11479
Since the user specified Notepad++ as the application, I suggest replacing
^.+/(\w+/\w+/\w+)
with
$1
Upvotes: 0
Reputation: 93636
Don't use a regex. Regexes are not a magic wand you wave at every problem that involves strings.
If you're using PHP, then use the explode
function to break the components into an array, and then use the last three elements of the array.
$name = 'Magnoliopsida/Dilleniidae/Malvales/Malvaceae/Abutilon/Abutilon_theophrasti';
$parts = explode( '/', $name );
$n = count($parts);
print $parts[$n-3] . ' ' . $parts[$n-2] . ' ' . $parts[$n-1];
Other languages will have similar functions.
Also /
is a slash, not a backslash. \
is backslash.
Upvotes: 0