Reputation: 23
I have a pandas dataframe column with characters like this (supposed to be a dictionary but became strings after scraping into a CSV):
{"id":307,"name":"Drinks","slug":"food/drinks"...`
I'm trying to extract the values for "name", so in this case it would be "Drinks".
The code I have right now (shown below) keeps outputting NaN for the entire dataframe.
df['extracted_category'] = df.category.str.extract('("name":*(?="slug"))')
What's wrong with my regex? Thanks!
Upvotes: 2
Views: 147
Reputation: 21
So, firstly the outer-most parenthesis in ("name":*(?="slug")) need to go because these represent the first group and the extracted value would then be equal to the first group which is not where the value of 'name' lies.
A simpler regex to try would be "name":"(\w*)" (Note: make sure to keep the part of the regex that you want to be extracted inside the parenthesis). This regex looks for the following string:
"name":"
and extracts all the alphabets that follow it (\w*) before stopping at another double quotation mark.
You can test your regex at: https://regex101.com/
Upvotes: 0
Reputation: 377
Hi @Ellie check also this approach:
x = {"id":307,"name":"Drinks","slug":"food/drinks"}
result = [(key, value) for key, value in x.items() if key.startswith("name")]
print(result)
[('name', 'Drinks')]
Upvotes: 0
Reputation: 7713
Better to convert it into dataframe you can use eval
and pd.Series
for that like
# sample dataframe
df
category
0 {"id":307,"name":"Drinks","slug":"food/drinks"}
df.category.apply(lambda x : pd.Series(eval(x)))
id name slug
0 307 Drinks food/drinks
Or convert only string to dictionary using eval
df['category'] = df.category.apply(eval)
df.category.str["name"]
0 Drinks
Name: category, dtype: object
Upvotes: 3