Ely
Ely

Reputation: 23

Python Pandas regex outputting NaN

I have a pandas dataframe column with characters like this (supposed to be a dictionary but became strings after scraping into a CSV):

{"id":307,"name":"Drinks","slug":"food/drinks"...`

I'm trying to extract the values for "name", so in this case it would be "Drinks".

The code I have right now (shown below) keeps outputting NaN for the entire dataframe.

df['extracted_category'] = df.category.str.extract('("name":*(?="slug"))')

What's wrong with my regex? Thanks!

Upvotes: 2

Views: 147

Answers (3)

Kausar Ahmad
Kausar Ahmad

Reputation: 21

So, firstly the outer-most parenthesis in ("name":*(?="slug")) need to go because these represent the first group and the extracted value would then be equal to the first group which is not where the value of 'name' lies.

A simpler regex to try would be "name":"(\w*)" (Note: make sure to keep the part of the regex that you want to be extracted inside the parenthesis). This regex looks for the following string:

    "name":"

and extracts all the alphabets that follow it (\w*) before stopping at another double quotation mark.

You can test your regex at: https://regex101.com/

Upvotes: 0

Jimmys
Jimmys

Reputation: 377

Hi @Ellie check also this approach:

x = {"id":307,"name":"Drinks","slug":"food/drinks"}
result = [(key, value) for key, value in x.items() if key.startswith("name")]
print(result)
[('name', 'Drinks')]

Upvotes: 0

Dishin H Goyani
Dishin H Goyani

Reputation: 7713

Better to convert it into dataframe you can use eval and pd.Series for that like

# sample dataframe
df
                                          category
0  {"id":307,"name":"Drinks","slug":"food/drinks"}

df.category.apply(lambda x : pd.Series(eval(x)))
    id    name         slug
0  307  Drinks  food/drinks

Or convert only string to dictionary using eval

df['category'] = df.category.apply(eval)

df.category.str["name"]
0    Drinks
Name: category, dtype: object

Upvotes: 3

Related Questions