user1301593
user1301593

Reputation: 631

How to extract only the last three parts of a string

For the following bit of example code, I'm only interested in the last three bits separated by backslashes (e.g., Family/Genus/Species Name).

So for:

Magnoliopsida/Dilleniidae/Malvales/Malvaceae/Abutilon/Abutilon_theophrasti
Magnoliopsida/Rosidae/Euphorbiales/Euphorbiaceae/Acalypha/Acalypha_rhomboidea
Magnoliopsida/Rosidae/Sapindales/Aceraceae/Acer/Acer_negundo
Magnoliopsida/Rosidae/Sapindales/Aceraceae/Acer/Acer_nigrum

I want:

Malvaceae/Abutilon/Abutilon_theophrasti
Euphorbiaceae/Acalypha/Acalypha_rhomboidea
Aceraceae/Acer/Acer_negundo
Aceraceae/Acer/Acer_nigrum

How do I got about accomplishing this with regex?

Edit: I'm using Notepad++'s Replace functionality with Regular expressions. I'm able to "Find" what I want to replace with ^[^/]+/[^/]+/[^/]+[^/]/ But when I replace it with nothing, it does something weird. Any suggestions?

Upvotes: 1

Views: 67

Answers (5)

vks
vks

Reputation: 67968

In Python

import re 
x="Magnoliopsida/Rosidae/Sapindales/Aceraceae/Acer/Acer_nigrum"
pattern=re.compile(r"\w+\/\w+\/\w+\/(\S+)")
y=pattern.match(x).groups()
print y

Output is ('Aceraceae/Acer/Acer_nigrum',)

Upvotes: 0

user1301593
user1301593

Reputation: 631

Okay, figured it out...

I can search: \n^[^/]+/[^/]+/[^/]+[^/]/ and replace with: \n

to get more-or-less what I want.

Thanks all!

Upvotes: 0

Buzz
Buzz

Reputation: 1907

you could try something like this:

(/(\w)*){3}$

Upvotes: 1

Charles
Charles

Reputation: 11479

Since the user specified Notepad++ as the application, I suggest replacing

^.+/(\w+/\w+/\w+)

with

$1

Upvotes: 0

Andy Lester
Andy Lester

Reputation: 93636

Don't use a regex. Regexes are not a magic wand you wave at every problem that involves strings.

If you're using PHP, then use the explode function to break the components into an array, and then use the last three elements of the array.

$name = 'Magnoliopsida/Dilleniidae/Malvales/Malvaceae/Abutilon/Abutilon_theophrasti';
$parts = explode( '/', $name );
$n = count($parts);
print $parts[$n-3] . ' ' . $parts[$n-2] . ' ' . $parts[$n-1];

Other languages will have similar functions.

Also / is a slash, not a backslash. \ is backslash.

Upvotes: 0

Related Questions