I am trying to do a couple of simple operation with this data set. I am trying to: Calculate the total of counts attributed to each cluster. For example, for cluster 0, I would have to sum 7+4+61+7+12= 91 add a new column 'total of counts' where the total of counts appears paired up with the corresponding cluster (i.e. rows with a value of '0' in the 'clusters' column, will have a value of 91 in the 'total of counts' column divide column 'counts' by 'total of counts' and multiply by 100 (calculate the percentage of counts). The result should be added into a new column. Can someone help me to write a code for this, please?

To calculate the total of counts attributed to each cluster, use this code: total = df.groupby('clusters')['count'].sum().rename('total of counts') To add a new column 'total of counts' where the total of counts appears paired up with the corresponding cluster, use this code: df = df.join(total, on='clusters', lsuffix='') To divide column 'counts' by 'total of counts' and multiply by 100, use this code: df['counts by total of counts'] = df['count']/df['total of counts']*100

pythonpandascountpercentage

Valentina Grillo-Alvarado

Reputation: 49

Adding a new column in python based on the value of another column pandas python

I am trying to do a couple of simple operation with this data set.

I am trying to:

Calculate the total of counts attributed to each cluster. For example, for cluster 0, I would have to sum 7+4+61+7+12= 91
add a new column 'total of counts' where the total of counts appears paired up with the corresponding cluster (i.e. rows with a value of '0' in the 'clusters' column, will have a value of 91 in the 'total of counts' column
divide column 'counts' by 'total of counts' and multiply by 100 (calculate the percentage of counts). The result should be added into a new column.

Can someone help me to write a code for this, please?

Upvotes: 1

Answers (3)

robbo

Reputation: 545

Assuming you've called your dataframe df, you can do the following:

point 1 use the groupby() method on the clusters column and calculate the sum using the sum() aggregation method like:

df_grouped = df.groupby('clusters').sum()

Once done, you might want to rename the column in that dataframe to something more useful like:

df_grouped = df_grouped.rename(columns={'count': 'cluster_count'})

point 2 To get the summed totals back into your dataframe you can merge the grouped_df with your original dataframe like:

df_merged = pd.merge(left=df, 
                     right=df_grouped, 
                     left_on='clusters', 
                     right_index=True)

Where you use the 'clusters' column is the key for your left dataframe and use the index of the df_grouped dataframe (the cluster values will be in the index there after the groupby() operation in point 1).

point 3 The last step is now trivial. Just use your final dataframe and add a new column that contains the result of the required calculation:

df_merged['count_pct_cluster'] = df_merged['count'] / df_merged['cluster_count'] * 100

Upvotes: 2

Mohit Natani

Reputation: 52

To calculate the total of counts attributed to each cluster, use this code:

total = df.groupby('clusters')['count'].sum().rename('total of counts')
To add a new column 'total of counts' where the total of counts appears paired up with the corresponding cluster, use this code:

df = df.join(total, on='clusters', lsuffix='')
To divide column 'counts' by 'total of counts' and multiply by 100, use this code:

df['counts by total of counts'] = df['count']/df['total of counts']*100

Upvotes: 2

Faten Elhariry

Reputation: 3

you can do this by using this line of code will provide you with new column called total and the value of this column will be the mean of values from column 0 to 11 and here you can replace the mean value with any other operation you need

 df['total'] = df.iloc[:,:12].mean()

Upvotes: 0

Adding a new column in python based on the value of another column pandas python

Answers (3)

Related Questions