NightLearner
NightLearner

Reputation: 305

Issues with coverting a python script into a Spotfire Python Data Function

I have a very simple script that two steps:

  1. sort a data frame by Column A and Column B
  2. create a new column (D) that is is made by Labeling sequential rows with identical values in Column C and increase the label by 1 every time the row values change and therefore label all the sequential groups of similar data.

My Python Script is below and works great, I'm not trying to bring this in as a Python data function into Spotfire and having issues connecting it to input and output parameters.

original python script

import pandas as pd
import numpy as np
df.sort_values(['ColumnA', 'ColumnB'], ascending=[True, True])
df['ColumnD'] = (df['ColumnC'] != df['ColumnC'].shift(1)).cumsum()

Expected output

I was trying to write my data function as:

import pandas as pd
import numpy as np
df.sort_values([A, B], ascending=[True, True])
D = (C != C.shift(1)).cumsum()

and make A, B, C "inputs" and D an output but it's not working. Any help is more that appreciated!

Upvotes: 0

Views: 703

Answers (1)

Gaia Paolini
Gaia Paolini

Reputation: 1477

I am editing my previous answer because the example data was already sorted, so the actual problem was hidden. Spotfire assumes that the output column is in the same order as the input data table. If the data table is sorted differently within a data function, then it needs sorting back to its original order before outputting a column.

So I created a calculated column ROWID: rowid() that is also input to the data function. This represents the 'natural' order of the rows.

This is the code that worked:

import pandas as pd
import numpy as np

df=df.sort_values(['A', 'B'], ascending=[True, True])
df['D'] = (df['C'] != df['C'].shift(1)).cumsum()
#re-sort by ROWID before creating the column vector
df=df.sort_values(['ROWID'],ascending=[True])
D=df['D']

Upvotes: 1

Related Questions