Reputation: 1
I have a python dataframe with a string column that I want to separate into several more columns.
Some rows of the DF look like this:
COLUMN
ORDP//NAME/iwantthispart/REMI/MORE TEXT
/REMI/SOMEMORETEXT
/ORDP//NAME/iwantthispart/ADDR/SOMEADRESS
/BENM//NAME/iwantthispart/REMI/SOMEMORETEXT
So basically i want everything after '/NAME/' and up to the next '/'. However. Not every row has the '/NAME/iwantthispart/' field, as can be seen in the second row.
I've tried using split functions, but ended up with the wrong results.
mt['COLUMN'].apply(lambda x: x.split('/NAME/')[-1])
This just gave me everything after the /NAME/ part, and in the cases that there was no /NAME/ it returned the full string to me.
Does anyone have some tips or solutions? Help is much appreciated! (the bullets are to make it more readable and are not actually in the data).
Upvotes: 0
Views: 3887
Reputation: 3826
These two lines will give you the second word regardless if the first word is name or not
mt["column"]=mt["column"].str.extract(r"(\w+/\w+/)")
mt["column"].str.extract(r"(\/\w+)")
This will give the following result as a column in pandas dataframe:
/iwantthispart
/SOMEMORETEXT
/iwantthispart
/iwantthispart
and incase you are only interested in the lines that contain NAME this will work for you just fine:
mt["column"]=mt["column"].str.extract(r"(\NAME/\w+/)")
mt["column"].str.extract(r"(\/\w+)")
This will give the following result:
/iwantthispart
/NaN
/iwantthispart
/iwantthispar
Upvotes: 0
Reputation: 51425
You could use str.extract
to extract the pattern of choice, using a regex:
# Generally, to match all word characters:
df.COLUMN.str.extract('NAME/(\w+)')
OR
# More specifically, to match everything up to the next slash:
df.COLUMN.str.extract('NAME/([^/]*)')
Both of which returns:
0 iwantthispart
1 NaN
2 iwantthispart
3 iwantthispart
Upvotes: 5