DNau
DNau

Reputation: 1

Calculating the correlation of each row with one in Group in Python

I need some help to calculate 4 metrics for my table. What is the correct way to constantly refer to the same string within the department?

I have been trying to solve this problem for a very long time, but I can not come to a logical solution. I have tried Partition from SQLalchemy, but my skills wasn't enough. I tried the def function, but functions are my weak point.

Therefore, I ask you for help with a solution or advice. Thank you!!!

my real task is not about salaries)))) don't worry about it!

EXAMPLE:

import pandas as pd

df2=pd.DataFrame({
    'head_department': [ 'Anna','Anna', 'Anna','Anna','Anna',
                         'John', 'John','John',
                         'Denis', 'Denis',
                         'Sarah',
                         'Greg', 'Greg','Greg','Greg', 'Greg','Greg'],
    'serial_number': [1,2,3,4,5,1,2,3,1,2,1,1,2,3,4,5,6],
    'Worker_in_dpt': ['Anna','Anna-s_worker_2','Anna-s_worker_3','Anna-s_worker_4','Anna-s_worker_5',
                      'John','John-s worker_2','John-s_worker_3',
                      'Denis','Denis-s_worker_2',
                      'Sarah',
                      'Greg','Greg-s_worker_2', 'Greg-s_worker_3', 'Greg-s_worker_4','Greg-s_worker_5', 'Greg-s_worker_6'],
   'jan_salary': [1000, 600, 600, 500, 900, None, 600, 500, 1200, 800, 1400, None, 700, 600, 600, 450, 700],
   'feb_salary': [1100, 700, 700, 700, 800, None, 500, 400, 1800, 900, 1000, 1400, 900, 800, 500, 450, 700],
   'mar_salary': [1200, 800, 800, 900, 700, 1300, 600, 500, 1800, 600, 1100, 1600, 400, 700, 600, 250, 700],
   'apr_salary': [1600, 900, 900, 700, 700, 2300, 500, 400, 1800, 900, 1100, 1900, 200, 900, 500, 150, 700],
   'may_salary': [1100, 700, 700, 700, 800, 2300, 500, 400, 1800, 900, 1000, 1400, 900, 800, 500, 450, 700],
   'jun_salary': [1200, 800, 800, 900, 700, 1300, 500, 400, 1800, 900, 1000, 1400, 900, 800, 500, 450, 800],
   'jul_salary': [1000, 600, 600, 500, 900, 1300, 600, 500, 1200, 800, 1400, 1200, 700, 600, 600, 450, 700],
   'aug_salary': [1100, 700, 700, 700, 800, 2300, 600, 500, 1800, 600, 1100, 1600, 400, 700, 600, 250, None]
})


df2['serial_number'] = df2['serial_number'].astype('str') # else it computed as int, when i use numeric_only=True
df2[ 'mean_salary_per_period'] = df2.mean(numeric_only=True, axis=1)
df2

enter image description here


i'm trying calculate columns:

WHERE:

Thank you!!!

Upvotes: 0

Views: 55

Answers (1)

Rawson
Rawson

Reputation: 2822

The following will give you the desired output, using groupby.apply() with a function:

df = df2.loc[:, ~df2.columns.isin(
    ["Worker_in_dpt", "mean_salary_per_period"])] \
    .set_index(['head_department', 'serial_number']).stack(dropna=False)

def func(frame):
    # unstack for serial number as columns
    frame = frame.unstack(level=["serial_number"])
    # correlations and take first row
    corr = frame.corr().iloc[0]
    # find months that are nan for serial number 1 (first column)
    nan = frame.iloc[:, 0].isna()
    # mean of non-nans
    mean_not_nan = frame.loc[~nan, :].mean(axis=0)
    # if nan, mean for these, else same as non-nans
    if nan.sum() > 0:    
        mean_nan = frame.loc[nan, :].mean(axis=0)
    else:
        mean_nan = mean_not_nan.copy()
    # sum for january
    sum_jan = pd.Series(data=frame.loc[
        frame.index.get_level_values(-1).str.contains("jan")].sum(axis=1)[0],
        index=frame.columns)
    
    # concat all series and name columns
    return pd.concat([
        corr, mean_nan, mean_not_nan, sum_jan], axis=1) \
        .set_axis(
            ['corr with head in dpt',
             'mean _when_1_in _NAN',
             'mean _when_1_in_NAN',
             'sum all dpt in Jan'], axis=1)

# groupby head_department and apply func
additional_cols = df.groupby(level="head_department").apply(func)

# merge df2 with grouped outputs
out = pd.merge(df2, additional_cols,
               left_on=['head_department', 'serial_number'],
               right_index=True)

Upvotes: 1

Related Questions