Reputation: 49
I am trying to do a couple of simple operation with this data set.
I am trying to:
Can someone help me to write a code for this, please?
Upvotes: 1
Views: 209
Reputation: 545
Assuming you've called your dataframe df
, you can do the following:
point 1
use the groupby()
method on the clusters column and calculate the sum using the sum()
aggregation method like:
df_grouped = df.groupby('clusters').sum()
Once done, you might want to rename the column in that dataframe to something more useful like:
df_grouped = df_grouped.rename(columns={'count': 'cluster_count'})
point 2 To get the summed totals back into your dataframe you can merge the grouped_df with your original dataframe like:
df_merged = pd.merge(left=df,
right=df_grouped,
left_on='clusters',
right_index=True)
Where you use the 'clusters' column is the key for your left dataframe and use the index of the df_grouped dataframe (the cluster values will be in the index there after the groupby()
operation in point 1).
point 3 The last step is now trivial. Just use your final dataframe and add a new column that contains the result of the required calculation:
df_merged['count_pct_cluster'] = df_merged['count'] / df_merged['cluster_count'] * 100
Upvotes: 2
Reputation: 52
To calculate the total of counts attributed to each cluster, use this code:
total = df.groupby('clusters')['count'].sum().rename('total of counts')
To add a new column 'total of counts' where the total of counts appears paired up with the corresponding cluster, use this code:
df = df.join(total, on='clusters', lsuffix='')
To divide column 'counts' by 'total of counts' and multiply by 100, use this code:
df['counts by total of counts'] = df['count']/df['total of counts']*100
Upvotes: 2
Reputation: 3
you can do this by using this line of code will provide you with new column called total and the value of this column will be the mean of values from column 0 to 11 and here you can replace the mean value with any other operation you need
df['total'] = df.iloc[:,:12].mean()
Upvotes: 0