Reputation: 525
I am getting a strange result in Power BI python visual. I am working with the diamonds dataset (sns.load_dataset('diamonds')
). I have this code in the python visual editor:
import seaborn as sns
import matplotlib.pyplot as plt
sns.histplot(dataset['carat'], bins = 50)
plt.show()
I am however getting this visual (truncated for most of the values, should be a bell curve, ish with the maximum bar going up to 11,000):
I have tried a seaborn swarmplot and that looks ok so it does not seem to be a data type issue. Dataset size is 53,940 rows, so well below the 150,000 max. Matplotlib plt.hist(dataset['carat'])
returns the truncated visual also, so it does not look like a seaborn thing.
Upvotes: 0
Views: 494
Reputation: 4005
The Python visual gives you a warning that it will drop duplicates and also supplies the formula it will use for the dataframe you will actually base your plot on:
By adding an index column in Power Query prior to loading the data, and adding both the (non-summarized) index column and the carat
column to the visualization, you will avoid this duplication removal.
Here I have used your exact code, but the visual evaluates an incoming data frame with all the rows instead of only distinct carat values:
Upvotes: 1