AdrianC
AdrianC

Reputation: 393

Removing List Within Pandas Dataframe

I have the following dataframe:

Index   Recipe_ID   order   content
0       1285        1       Heat oil in a large frypan with lid over mediu...
1       1285        2       Meanwhile, add cauliflower to a pot of boiling...
2       1285        3       Remove lid from chicken and let simmer uncover... 
3       1289        1       To make the dressing, whisk oil, vinegar and m...
4       1289        2       Cook potatoes in a large saucepan of boiling w..

Task: I need to get the contents in one cell:

df = df.groupby('recipe_variation_part_id', as_index=False).agg(lambda x: x.tolist())

This returns the following:

Index   Recipe_ID   order         content
0       1285        [1, 2, 3]     [Heat oil in a large frypan with lid over medi...
1       1289        [1, 2, 3]     [To make the dressing, whisk oil, vinegar and ...
2       1297        [1, 2, 4, 3]  [Place egg in saucepan of cold water and bring...
3       1301        [1, 2]        [Preheat a non-stick frying pan and pan fry th...
4       1309        [2, 3, 4, 1]  [Meanwhile, cook noodles according to package ...

If you look at the first recipe entry, you get the following:

['Heat oil in a large frypan with lid over medium-high heat. Cook onions, garlic and rosemary for a couple of minutes until soft. Add chicken and brown on both sides for a few minutes, then add in tomatoes and olives. Season with salt and pepper and allow to simmer with lid on for 20-25 minutes. ',
 'Meanwhile, add cauliflower to a pot of boiling water and cook for 10 minutes or until soft. Drain and then mash and gently fold in olive oil, parmesan, salt and pepper. ',
 'Remove lid from chicken and let simmer uncovered for five minutes more. Sprinkle with parsley then serve with cauliflower mash. ']

This is what I want, but I need to remove the square brackets

dtype = list

I've tried:

df.applymap(lambda x: x[0] if isinstance(x, list) else x)

Only returns the first entry, not every step

I've tried:

df['content'].str.replace(']', '')

Only returns NANs

I've tried:

df['content'].str.replace(r'(\[\[(?:[^\]|]*\|)?([^\]|]*)\]\])', '')

Only returns NANs

I've tried:

df['content'].str.get(0)

Only returns the first entry

Any help would be greatly appreciated.

If you need any further information, please let me know.

Upvotes: 2

Views: 3512

Answers (1)

Eran Moshe
Eran Moshe

Reputation: 3208

I've created a little example that might solve this problem for you:

import pandas as pd
df = pd.DataFrame({'order': [1, 1, 2], 'content': ['hello', 'world', 'sof']})
df
Out[4]: 
   order content
0      1   hello
1      1   world
2      2     sof
df.groupby(by=['order']).agg(lambda x: ' '.join(x))
Out[5]: 
           content
order             
1      hello world
2              sof

So just like you do in line 5th in your question, you use ' '.join(x) instead of tolist() which will put everything as 1 big string instead list of strings, therefor, no []

Upvotes: 3

Related Questions