SeyoM
SeyoM

Reputation: 1

how to modify grouped data in pandas

i would like to modify grouped data in pandas. I wrote a shortcode that doesn't work. unfortunately outside of the loop when I use gr.get_group('Audi') the data remains unchanged. How to modify grouped daraframes and how to return from grouped data to dataframes later.


import pandas as pd
import numpy as np

d = {'car' : ["Audi", "Audi", "Audi", "BMW", "BMW", "BMW", "FIAT", "FIAT", "FIAT", "FIAT"],
    'year' : [2000, 2001, 1995, 1992, 2003, 2003, 2011, 1982, 1997, 2002]}

df = pd.DataFrame.from_dict(d)
df['new'] = np.nan

gr = df.groupby('car')

for key, val in gr:
    val.loc[val['year']<2000, 'new'] = f'new {key}'

gr.get_group('car')

I would like to use this method because in each dataframe I want to use a different method to set the new column

for example for Audi it will usually be adding a variable, while for BMW I want to use the map function

for key, val in gr:
    if key == 'Audi':
        val.loc[val['year']<2000, 'new'] = f'new {key}'
    elif key == 'BMW':
        pass
        #  here another method
    elif key == 'FIAT'
        #  here another metod
    else:
        val.loc[val['year']<2000, 'new'] = 'UNKNOW'

at the end i would like to get a table like dataframe but with filled column `new

Upvotes: 0

Views: 80

Answers (1)

Orfeas Bourchas
Orfeas Bourchas

Reputation: 363

Try to pd.concat the val in each for loop to with the df_new like below

import pandas as pd
import numpy as np

d = {'car' : ["Audi", "Audi", "Audi", "BMW", "BMW", "BMW", "FIAT", "FIAT", "FIAT", "FIAT"],
    'year' : [2000, 2001, 1995, 1992, 2003, 2003, 2011, 1982, 1997, 2002]}

df = pd.DataFrame.from_dict(d)
df['new'] = np.nan
df_new = pd.DataFrame()
gr = df.groupby('car')

for key, val in gr:
    print(key,val)
    if key == 'Audi':
        val.loc[val['year']<2000, 'new'] = f'new {key}'
    elif key == 'BMW':
        pass
        #  here another method
    elif key == 'FIAT':
        pass#  here another metod
    else:
        val.loc[val['year']<2000, 'new'] = 'UNKNOW'
    df_new = pd.concat([df_new, val])

Probably you can also do this with df.itertuples or some other method which I am currently not aware.

Upvotes: 1

Related Questions