Reputation: 11
I am trying to create a correlation table in a Python visual in a Power Bi report with matplotlib. The problem is the code will run and produce a visual with all null values "nan". enter image description here
I had to do a lot of data transformations to my variables -- i.e. for specific years and to aggregate all variables by a variable called CIP Family. I want to know if this code can run and if so how.
import pandas as pd
import matplotlib.pyplot as plt
dataset['Date'] = pd.to_datetime(dataset['Date'], errors='coerce')
# 1. Filter and group data for CTOTALTforFinal99Family (2010-2019)
filtered_CTOTALTforFinal99Family = dataset[
(dataset['Date'] >= '2010-01-01') & (dataset['Date'] <= '2019-01-01')
].groupby('CIP Family').agg({'CTOTALTforFinal99Family': 'sum'}).reset_index()
# 2. Filter and group data for CTOTALTNew2020Family (2020-2023)
filtered_CTOTALTNew2020Family = dataset[
(dataset['Date'] >= '2020-01-01') & (dataset['Date'] <= '2023-01-01')
].groupby('CIP Family').agg({'CTOTALTNew2020Family': 'sum'}).reset_index()
# 3. Group and sum New2020CountFamily by 'CIP Family'
grouped_New2020CountFamily = dataset.groupby('CIP Family').agg({'New2020CountFamily': 'sum'}).reset_index()
# 4. Calculate PerctCTOTALT99Family for each 'CIP Family' (2010-2019)
filtered_2010_2019 = dataset[(dataset['Date'] >= '2010-01-01') & (dataset['Date'] <= '2019-01-01')]
grouped_2010_2019 = filtered_2010_2019.groupby('CIP Family').agg({
'CTOTALTforFinal99Family': 'sum',
'CTOTALTFamily': 'sum'
}).reset_index()
grouped_2010_2019['PerctCTOTALT99Family'] = (
grouped_2010_2019['CTOTALTforFinal99Family'] / grouped_2010_2019['CTOTALTFamily']
) * 100
grouped_2010_2019['PerctCTOTALT99Family'] = grouped_2010_2019['PerctCTOTALT99Family'].fillna(0)
# 5. Calculate PerctCTOTALTNew2020Family for each 'CIP Family' (2020-2023)
filtered_2020_2023 = dataset[(dataset['Date'] >= '2020-01-01') & (dataset['Date'] <= '2023-01-01')]
grouped_2020_2023 = filtered_2020_2023.groupby('CIP Family').agg({
'CTOTALTNew2020Family': 'sum',
'CTOTALTFamily': 'sum'
}).reset_index()
grouped_2020_2023['PerctCTOTALTNew2020Family'] = (
grouped_2020_2023['CTOTALTNew2020Family'] / grouped_2020_2023['CTOTALTFamily']
) * 100
grouped_2020_2023['PerctCTOTALTNew2020Family'] = grouped_2020_2023['PerctCTOTALTNew2020Family'].fillna(0)
# Merge all variables of interest by 'CIP Family'
merged_data = (
grouped_2010_2019[['CIP Family', 'CTOTALTforFinal99Family', 'PerctCTOTALT99Family']]
.merge(grouped_2020_2023[['CIP Family', 'CTOTALTNew2020Family', 'PerctCTOTALTNew2020Family']], on='CIP Family', how='outer')
.merge(grouped_New2020CountFamily, on='CIP Family', how='outer')
)
# Final Column Selection and Ordering
final_data = merged_data[[
'CIP Family',
'CTOTALTforFinal99Family',
'PerctCTOTALT99Family',
'CTOTALTNew2020Family',
'PerctCTOTALTNew2020Family',
'New2020CountFamily'
]]
# Compute the correlation matrix
correlation_data = final_data.drop(columns='CIP Family').copy()
corr = final_data.corr().round(3)
# Create a new figure for displaying the table
fig, ax = plt.subplots(figsize=(12, 6)) # Adjust figure size
# Hide the axes (only show the table)
ax.axis('off')
# Display the correlation matrix as a table
table = ax.table(
cellText=corr.values,
colLabels=corr.columns,
rowLabels=corr.index,
loc='center',
cellLoc='center',
bbox=[0, 0, 1, 1], # Table occupies full figure area
)
# Customize table header and styling
table.auto_set_font_size(False)
table.set_fontsize(10) # Adjust font size as needed
# Bold the header row and column
for (i, j), cell in table.get_celld().items():
if i == 0 or j == -1:
cell.set_text_props(weight='bold')
# Add a title for clarity
plt.title("Correlation Matrix", fontsize=16, weight='bold')
# Show the table in Power BI
plt.show()
As an aside -- a slightly modified version ran fine in jupyter notebook. I'm wondering if in trying to adapt the code for PowerBi I made a mistake or if there's something in my code that doesn't run because of limitations specific to Python visuals. I've exhausted what I can personally do with ChatGPT to pinpoint the problem. I truly don't understand Python visuals.
Upvotes: 1
Views: 39