Dactylus
Dactylus

Reputation: 1

Extract substring between 2 strings in python

I have a python dataframe with a string column that I want to separate into several more columns.

Some rows of the DF look like this:

COLUMN

ORDP//NAME/iwantthispart/REMI/MORE TEXT
/REMI/SOMEMORETEXT
/ORDP//NAME/iwantthispart/ADDR/SOMEADRESS
/BENM//NAME/iwantthispart/REMI/SOMEMORETEXT

So basically i want everything after '/NAME/' and up to the next '/'. However. Not every row has the '/NAME/iwantthispart/' field, as can be seen in the second row.

I've tried using split functions, but ended up with the wrong results.

mt['COLUMN'].apply(lambda x: x.split('/NAME/')[-1])

This just gave me everything after the /NAME/ part, and in the cases that there was no /NAME/ it returned the full string to me.

Does anyone have some tips or solutions? Help is much appreciated! (the bullets are to make it more readable and are not actually in the data).

Upvotes: 0

Views: 3887

Answers (2)

Inder
Inder

Reputation: 3826

These two lines will give you the second word regardless if the first word is name or not

mt["column"]=mt["column"].str.extract(r"(\w+/\w+/)")
mt["column"].str.extract(r"(\/\w+)")

This will give the following result as a column in pandas dataframe:

/iwantthispart
/SOMEMORETEXT
/iwantthispart
/iwantthispart

and incase you are only interested in the lines that contain NAME this will work for you just fine:

mt["column"]=mt["column"].str.extract(r"(\NAME/\w+/)")
mt["column"].str.extract(r"(\/\w+)")

This will give the following result:

/iwantthispart
/NaN
/iwantthispart
/iwantthispar

Upvotes: 0

sacuL
sacuL

Reputation: 51425

You could use str.extract to extract the pattern of choice, using a regex:

# Generally, to match all word characters:
df.COLUMN.str.extract('NAME/(\w+)')

OR

# More specifically, to match everything up to the next slash:
df.COLUMN.str.extract('NAME/([^/]*)')

Both of which returns:

0    iwantthispart
1              NaN
2    iwantthispart
3    iwantthispart

Upvotes: 5

Related Questions