mathdeoliveira
mathdeoliveira

Reputation: 13

python - i need to fill dataframe with for loop using sum()

how are?

i need to complete my dataframe with data returned by a sum function.

so, I have this code that creates my dataframe:

columns = ['hom_doloso', 'lesao_corp_morte', 'latrocinio',
       'hom_por_interv_policial', 'tentat_hom', 'lesao_corp_dolosa', 'estupro',
       'hom_culposo', 'lesao_corp_culposa', 'roubo_comercio',
       'roubo_residencia', 'roubo_veiculo', 'roubo_carga', 'roubo_transeunte',
       'roubo_em_coletivo', 'roubo_banco', 'roubo_cx_eletronico',
       'roubo_celular', 'roubo_conducao_saque', 'roubo_apos_saque',
       'roubo_bicicleta', 'outros_roubos', 'total_roubos', 'furto_veiculos',
       'furto_transeunte', 'furto_coletivo', 'furto_celular',
       'furto_bicicleta', 'outros_furtos', 'sequestro',
       'extorsao', 'sequestro_relampago', 'estelionato', 'apreensao_drogas',
       'posse_drogas', 'trafico_drogas', 'apreensao_drogas_sem_autor', 'ameaca']

index = [1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001,
       2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012,
       2013, 2014, 2015, 2016, 2017, 2018, 2019] 
new_df = pd.DataFrame(index=index, columns=columns) 

Then i need to fill each column in this DataFrame with a aggregation function, like:

new_df.hom_doloso = df_clean.groupby('vano').hom_doloso.sum()

returning:

new_df.head()
      hom_doloso   lesao_corp_morte   latrocinio  ...
1991    7518              NaN             NaN
1992    7635              NaN             NaN
1993    7720              NaN             NaN
1994    8408              NaN             NaN

The above code only completes one column of the dataframe, i really don't want to write all this columns names, so i tried to use For Loop, but i can't do that

for column_name in columns:
  new_df.column_name = df_clean.groupby('vano').column_name.sum()

The column_name in this for need to be replace by the each column name

df_clean: df_clean.shape: (344, 56)

df_clean.head()```
      vano    hom_doloso   lesao_corp_morte   latrocinio  ...
0     1991    7518              0.0             18
1     1992    7635              0.0             17
2     1993    7720              0.0             16
3     1994    8408              0.0             15

The new_df will only have these columns in variable called columns, which df_clean also have.

I need to groupby by 'vano' to return the sum of each column.

Is there some way to do this?

Upvotes: 1

Views: 130

Answers (1)

Scott Boston
Scott Boston

Reputation: 153460

No looping needed. Let's try something like this:

np.random.seed(123)
df = pd.DataFrame(np.random.randint(0,100,(5,5)), columns=[*'ABCDE'])
df['grp'] = ['A','A','A','B','B']
print(df)

Output:

    A   B   C   D   E grp
0  66  92  98  17  83   A
1  57  86  97  96  47   A
2  73  32  46  96  25   A
3  83  78  36  96  80   B
4  68  49  55  67   2   B

Now, let's reshape dataframe stack all columns we want to sum, set index on all columns that we don't want to sum and use groupby with levels. Lastly, unstack the inner most level to get columns back.

df.set_index('grp').stack().groupby(level=[0,1]).sum().unstack()

or

df.set_index('grp').stack().sum(level=[0,1]).unstack().reset_index()

Output:

  grp    A    B    C    D    E
0   A  196  210  241  209  155
1   B  151  127   91  163   82

Upvotes: 1

Related Questions