Reputation: 2090
I'm adding a python script as part of a Tableau calculated field and it appears Tableau is passing one row of data at a time to the calculated field instead of the whole lists (for _arg1
and _arg2
). I already have the setup TabPy and made the connection with the local host, etc. I can run "hello world!" type scripts without errors. I'm trying to follow some simple DBSCAN tutorial(s) online I've found on my own dataset. I have a 2-D scatter plot in Tableau and I'm trying to cluster the data points using the 2 axes in the plot. Here's the code for the calculated field I'm using now.
SCRIPT_STR(
"from sklearn.cluster import DBSCAN
from sklearn.preprocessing import StandardScaler
import numpy as np
import pandas as pd
import string
def int_to_string(val):
if val == -2:
return 'NaN'
elif val == -1:
return 'Outlier'
else:
return string.ascii_lowercase[val]
eps=1
min_samples=10
ids = range(len(_arg1))
X = np.column_stack([_arg1, _arg2])
X = pd.DataFrame(X, index=ids, columns=['x', 'y'])
X.dropna(how='any', inplace=True)
X_scale = StandardScaler().fit_transform(X)
labels = DBSCAN(eps=eps, n_jobs=-1,
min_samples=min_samples).fit_predict(X_scale)
result = pd.Series(index=X.index)
result.loc[X.index] = labels
result.fillna(-2, inplace=True)
result = result.apply(int_to_string)
return list(result)",
avg([Var1]), avg([Var2])
)
It's more complicated than the tutorial because my data set has NaN values and I'm trying to handle those with the pandas code.
The real problem is that the X
DataFrame seems to only be 1 row in size. I know that's not true for the actual data; in Tableau, there are 1000's of data points showing on the scatterplot. I know that it only has 1 row of data because I get the following error from Tableau (I think this error is occuring when the one row of data happens to have a null value in it)...
...and because I added a pickle statement into the script for a little while to export the X
DataFrame to file and when I open that pickled object in Python it shows the DataFrame has a shape of (1, 2); 1 row and 2 columns
Var1
and Var2
aren't aggregated fields, or anything so taking the average should not reduce them to a single value.
Has anyone run into this before? What is wrong with the Tableau Script code that might be causing this issue? Or am I doing something else wrong?
Upvotes: 2
Views: 1383
Reputation: 71
To send all of your data at once you should change addressing settings for your script calculation. Let's say you put your calculation on Rows, then right click on it, select Edit Table Calculation, then Select Specific Dimensions and check each dimension you have there
Upvotes: 3