Reputation: 1
i would like to modify grouped data in pandas. I wrote a shortcode that doesn't work. unfortunately outside of the loop when I use gr.get_group('Audi')
the data remains unchanged. How to modify grouped daraframes and how to return from grouped data to dataframes later.
import pandas as pd
import numpy as np
d = {'car' : ["Audi", "Audi", "Audi", "BMW", "BMW", "BMW", "FIAT", "FIAT", "FIAT", "FIAT"],
'year' : [2000, 2001, 1995, 1992, 2003, 2003, 2011, 1982, 1997, 2002]}
df = pd.DataFrame.from_dict(d)
df['new'] = np.nan
gr = df.groupby('car')
for key, val in gr:
val.loc[val['year']<2000, 'new'] = f'new {key}'
gr.get_group('car')
I would like to use this method because in each dataframe I want to use a different method to set the new
column
for example for Audi it will usually be adding a variable, while for BMW I want to use the map function
for key, val in gr:
if key == 'Audi':
val.loc[val['year']<2000, 'new'] = f'new {key}'
elif key == 'BMW':
pass
# here another method
elif key == 'FIAT'
# here another metod
else:
val.loc[val['year']<2000, 'new'] = 'UNKNOW'
at the end i would like to get a table like dataframe but with filled column `new
Upvotes: 0
Views: 80
Reputation: 363
Try to pd.concat
the val
in each for loop to with the df_new
like below
import pandas as pd
import numpy as np
d = {'car' : ["Audi", "Audi", "Audi", "BMW", "BMW", "BMW", "FIAT", "FIAT", "FIAT", "FIAT"],
'year' : [2000, 2001, 1995, 1992, 2003, 2003, 2011, 1982, 1997, 2002]}
df = pd.DataFrame.from_dict(d)
df['new'] = np.nan
df_new = pd.DataFrame()
gr = df.groupby('car')
for key, val in gr:
print(key,val)
if key == 'Audi':
val.loc[val['year']<2000, 'new'] = f'new {key}'
elif key == 'BMW':
pass
# here another method
elif key == 'FIAT':
pass# here another metod
else:
val.loc[val['year']<2000, 'new'] = 'UNKNOW'
df_new = pd.concat([df_new, val])
Probably you can also do this with df.itertuples
or some other method which I am currently not aware.
Upvotes: 1