Reputation: 13
how are?
i need to complete my dataframe with data returned by a sum function.
so, I have this code that creates my dataframe:
columns = ['hom_doloso', 'lesao_corp_morte', 'latrocinio',
'hom_por_interv_policial', 'tentat_hom', 'lesao_corp_dolosa', 'estupro',
'hom_culposo', 'lesao_corp_culposa', 'roubo_comercio',
'roubo_residencia', 'roubo_veiculo', 'roubo_carga', 'roubo_transeunte',
'roubo_em_coletivo', 'roubo_banco', 'roubo_cx_eletronico',
'roubo_celular', 'roubo_conducao_saque', 'roubo_apos_saque',
'roubo_bicicleta', 'outros_roubos', 'total_roubos', 'furto_veiculos',
'furto_transeunte', 'furto_coletivo', 'furto_celular',
'furto_bicicleta', 'outros_furtos', 'sequestro',
'extorsao', 'sequestro_relampago', 'estelionato', 'apreensao_drogas',
'posse_drogas', 'trafico_drogas', 'apreensao_drogas_sem_autor', 'ameaca']
index = [1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001,
2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012,
2013, 2014, 2015, 2016, 2017, 2018, 2019]
new_df = pd.DataFrame(index=index, columns=columns)
Then i need to fill each column in this DataFrame with a aggregation function, like:
new_df.hom_doloso = df_clean.groupby('vano').hom_doloso.sum()
returning:
new_df.head()
hom_doloso lesao_corp_morte latrocinio ...
1991 7518 NaN NaN
1992 7635 NaN NaN
1993 7720 NaN NaN
1994 8408 NaN NaN
The above code only completes one column of the dataframe, i really don't want to write all this columns names, so i tried to use For Loop, but i can't do that
for column_name in columns:
new_df.column_name = df_clean.groupby('vano').column_name.sum()
The column_name in this for need to be replace by the each column name
df_clean: df_clean.shape: (344, 56)
df_clean.head()```
vano hom_doloso lesao_corp_morte latrocinio ...
0 1991 7518 0.0 18
1 1992 7635 0.0 17
2 1993 7720 0.0 16
3 1994 8408 0.0 15
The new_df will only have these columns in variable called columns, which df_clean also have.
I need to groupby by 'vano' to return the sum of each column.
Is there some way to do this?
Upvotes: 1
Views: 130
Reputation: 153460
No looping needed. Let's try something like this:
np.random.seed(123)
df = pd.DataFrame(np.random.randint(0,100,(5,5)), columns=[*'ABCDE'])
df['grp'] = ['A','A','A','B','B']
print(df)
Output:
A B C D E grp
0 66 92 98 17 83 A
1 57 86 97 96 47 A
2 73 32 46 96 25 A
3 83 78 36 96 80 B
4 68 49 55 67 2 B
Now, let's reshape dataframe stack all columns we want to sum, set index on all columns that we don't want to sum and use groupby with levels. Lastly, unstack the inner most level to get columns back.
df.set_index('grp').stack().groupby(level=[0,1]).sum().unstack()
or
df.set_index('grp').stack().sum(level=[0,1]).unstack().reset_index()
Output:
grp A B C D E
0 A 196 210 241 209 155
1 B 151 127 91 163 82
Upvotes: 1