ZZplantzzz
ZZplantzzz

Reputation: 11

Power Bi Python Visual Data Transformation and Table

I am trying to create a correlation table in a Python visual in a Power Bi report with matplotlib. The problem is the code will run and produce a visual with all null values "nan". enter image description here

I had to do a lot of data transformations to my variables -- i.e. for specific years and to aggregate all variables by a variable called CIP Family. I want to know if this code can run and if so how.

import pandas as pd
import matplotlib.pyplot as plt 

dataset['Date'] = pd.to_datetime(dataset['Date'], errors='coerce')

# 1. Filter and group data for CTOTALTforFinal99Family (2010-2019)
filtered_CTOTALTforFinal99Family = dataset[
    (dataset['Date'] >= '2010-01-01') & (dataset['Date'] <= '2019-01-01')
].groupby('CIP Family').agg({'CTOTALTforFinal99Family': 'sum'}).reset_index()

# 2. Filter and group data for CTOTALTNew2020Family (2020-2023)
filtered_CTOTALTNew2020Family = dataset[
    (dataset['Date'] >= '2020-01-01') & (dataset['Date'] <= '2023-01-01')
].groupby('CIP Family').agg({'CTOTALTNew2020Family': 'sum'}).reset_index()

# 3. Group and sum New2020CountFamily by 'CIP Family'
grouped_New2020CountFamily = dataset.groupby('CIP Family').agg({'New2020CountFamily': 'sum'}).reset_index()

# 4. Calculate PerctCTOTALT99Family for each 'CIP Family' (2010-2019)

filtered_2010_2019 = dataset[(dataset['Date'] >= '2010-01-01') & (dataset['Date'] <= '2019-01-01')]

grouped_2010_2019 = filtered_2010_2019.groupby('CIP Family').agg({
    'CTOTALTforFinal99Family': 'sum',
    'CTOTALTFamily': 'sum'
}).reset_index()

grouped_2010_2019['PerctCTOTALT99Family'] = (
    grouped_2010_2019['CTOTALTforFinal99Family'] / grouped_2010_2019['CTOTALTFamily']
) * 100
grouped_2010_2019['PerctCTOTALT99Family'] = grouped_2010_2019['PerctCTOTALT99Family'].fillna(0)

 # 5. Calculate PerctCTOTALTNew2020Family for each 'CIP Family' (2020-2023)
filtered_2020_2023 = dataset[(dataset['Date'] >= '2020-01-01') & (dataset['Date'] <= '2023-01-01')]
grouped_2020_2023 = filtered_2020_2023.groupby('CIP Family').agg({
    'CTOTALTNew2020Family': 'sum',
    'CTOTALTFamily': 'sum'
}).reset_index()

grouped_2020_2023['PerctCTOTALTNew2020Family'] = (
    grouped_2020_2023['CTOTALTNew2020Family'] / grouped_2020_2023['CTOTALTFamily']
) * 100
grouped_2020_2023['PerctCTOTALTNew2020Family'] = grouped_2020_2023['PerctCTOTALTNew2020Family'].fillna(0)

# Merge all variables of interest by 'CIP Family'
merged_data = (
    grouped_2010_2019[['CIP Family', 'CTOTALTforFinal99Family', 'PerctCTOTALT99Family']]
    .merge(grouped_2020_2023[['CIP Family', 'CTOTALTNew2020Family', 'PerctCTOTALTNew2020Family']], on='CIP Family', how='outer')
    .merge(grouped_New2020CountFamily, on='CIP Family', how='outer')
)

# Final Column Selection and Ordering
final_data = merged_data[[
    'CIP Family',
    'CTOTALTforFinal99Family',
    'PerctCTOTALT99Family',
    'CTOTALTNew2020Family',
    'PerctCTOTALTNew2020Family',
    'New2020CountFamily'
]]

# Compute the correlation matrix
correlation_data = final_data.drop(columns='CIP Family').copy()
corr = final_data.corr().round(3)

# Create a new figure for displaying the table
fig, ax = plt.subplots(figsize=(12, 6))  # Adjust figure size

# Hide the axes (only show the table)
ax.axis('off')

# Display the correlation matrix as a table
table = ax.table(
    cellText=corr.values,
    colLabels=corr.columns,
    rowLabels=corr.index,
    loc='center',
    cellLoc='center',
    bbox=[0, 0, 1, 1],  # Table occupies full figure area
)

# Customize table header and styling
table.auto_set_font_size(False)
table.set_fontsize(10)  # Adjust font size as needed

# Bold the header row and column
for (i, j), cell in table.get_celld().items():
    if i == 0 or j == -1:
        cell.set_text_props(weight='bold')

# Add a title for clarity
plt.title("Correlation Matrix", fontsize=16, weight='bold')

# Show the table in Power BI
plt.show()

As an aside -- a slightly modified version ran fine in jupyter notebook. I'm wondering if in trying to adapt the code for PowerBi I made a mistake or if there's something in my code that doesn't run because of limitations specific to Python visuals. I've exhausted what I can personally do with ChatGPT to pinpoint the problem. I truly don't understand Python visuals.

Upvotes: 1

Views: 39

Answers (0)

Related Questions