testenthu
testenthu

Reputation: 79

absolute difference giving minus percentages

I have a DataFrame to find the percentage of absolute difference between two sources in python. But when I use the below code, few of the columns are giving the -% (minus percentages)

I have checked the columns showing minus percentage datatypes are same in both the sources.

Can anyone help me to find out why?

    # Define the columns you want to process
columns = ['a', 'b', 'c', 'd']

# Create the results DataFrame
results_df = pd.DataFrame()

results_df['date'] = c_df['date']
results_df['id'] = c_df['id']

for col in columns:
    # calculating the absolute difference
    diff = np.abs(c_df[f'{col}_s1'] - c_df[f'{col}_s2'])

    # calculating mean for snowflake columns
    norm_diff = c_df[f'{col}_s1'].mean()

    # Avoid division by zero
    if norm_diff == 0:
        percentage = np.zeros(len(diff))
    else:
        # calculating percentage difference b/w the columns in both datasets
        percentage = (diff / norm_diff) * 100

    # Round the percentage_difference and add % symbol
    results_df[col] = [f"{round(diff)}%" for diff in percentage]

display(results_df)

And the results are something like this.

a b c d
0% 5% 10% -5%
10% 100% -50% 200%

Upvotes: 1

Views: 74

Answers (1)

Anerdw
Anerdw

Reputation: 1987

Can't be sure without seeing your data, but it could be the way you're calculating norm_diff. When you run norm_diff = c_df[f'{col}_s1'].mean(), you'll end up with negative values if the average for the column is negative. That would give you a negative value when you calculate percentage later.

Try logging the value of norm_diff and percentage after each loop; if this is what's happening, percentage will be negative exactly when norm_diff is.

Depending on what you're looking for, you might be able to fix this just by taking the absolute value when you calculate norm_diff:

- norm_diff = c_df[f'{col}_s1'].mean()
+ norm_diff = abs(c_df[f'{col}_s1'].mean())

Upvotes: 1

Related Questions