ApacheOne
ApacheOne

Reputation: 245

extract values from column in dataframe

I have the following dataframe:

A
url/3gth33/item/PO151302
url/3jfj6/item/S474-3
url/dfhk34j/item/4964114989191
url/sdfkj3k4/place/9b81f6fd
url/as3f343d/thing/ecc539ec

I'm looking to extract anything with /item/ and its subsequent value.

The end result should be:

item
/item/PO151302
/item/S474-3
/item/4964114989191

here is what I've tried:

df['A'] = df['A'].str.extract(r'(/item/\w+\D+\d+$)')

This is returning what I need except the integer only values.

Based on the regex docs I'm reading this should grab all instances.

What am I missing here?

Upvotes: 0

Views: 93

Answers (2)

rpanai
rpanai

Reputation: 13447

This is not a regex solution but it could come handy in some situations.

keyword = "/item/"
df["item"] = ((keyword + df["A"].str.split(keyword).str[-1]) * 
              df["A"].str.contains(keyword))

which returns

                               A                 item
0        url/3gth33/item/PO151302       /item/PO151302
1           url/3jfj6/item/S474-3         /item/S474-3
2  url/dfhk34j/item/4964114989191  /item/4964114989191
3     url/sdfkj3k4/place/9b81f6fd                     
4     url/as3f343d/thing/ecc539ec                     
5                                                     

And in case you want only the rows where item is not empty you could use

df[df["item"].ne("")][["item"]]

Upvotes: 0

user17242583
user17242583

Reputation:

Use /item/.+ to match /item/ and anything after. Also, if you put ?P<foo> at the beginning of a group, e.g. (?P<foo>...), the column for that matched group in the returned dataframe of captures will be named what's inside the <...>:

item = df['A'].str.extract('(?P<item>/item/.+)').dropna()

Output:

>>> item
                  item
0       /item/PO151302
1         /item/S474-3
2  /item/4964114989191

Upvotes: 2

Related Questions