Nurbek Kuantyrov
Nurbek Kuantyrov

Reputation: 177

Row value Difference

I'm trying to add a calculated column here for certain groups of values as they are identified as duplicate values, but it is more trickier than the unique value identifier.

So my data table looks like this: AS-IS

Material   MvtType   Date      Order No. Quantity    RowID
10000001   101        01/01/19  123452    1.0            1
10000001   101        03/01/19  123456    5.0            5
10000001   102        04/01/19  123456   -5.0            7
10000001   101        07/01/19  123456    7.0            8
10000001   101        01/02/19  123453    10.0           9
10000002   101        10/01/19  222221    100.0         44
10000002   101        11/01/19  222221    150.0         43
10000002   102        11/01/19  222221   -100.0         41 
10000002   102        13/01/19  222221   -150.0         42 
10000002   101        03/02/19  222221    200.0         50
10000002   101        06/02/19  222221    300.0         51 

There are two types of Movement type: 101=receiving a Material to stock and 102= (returning due to error). Then usually very soon a material is 101 = received to stock with corrected values (usually quantity or correct order number). Therefore, previous two items are duplicates and have to be ignored and I'm trying to add a calculated column that will show the rows that are "duplicated" and needs to be deleted.

I've tried to add column that only shows all mvt type = 102 (smth like: If([MvtType]=102, "delete", "ok")), since anyway they should be ignored in the end, but struggling to write an expression that will include the last row that has the same Material code, Date that is equal or earlier than 102 row, same Order no. and same quantity (different sign [+]), but with different movement type, i.e. 101 movement type.

TO-BE:

    Material   MvtType   Date      Order No. Quantity   RowID   Calculated Column
    10000001   101        01/01/19  123452    1.0           1   OK
    10000001   101        03/01/19  123456    5.0           5   delete
    10000001   102        04/01/19  123456   -5.0           7   delete
    10000001   101        07/01/19  123456    7.0           8   OK
    10000001   101        01/02/19  123453    10.0          9   OK
    10000002   101        10/01/19  222221    100.0        44   delete 
    10000002   101        11/01/19  222221    150.0        43   delete
    10000002   102        11/01/19  222221   -100.0        41   delete
    10000002   102        13/01/19  222221   -150.0        42   delete
    10000002   101        03/02/19  222221    200.0        50   OK
    10000002   101        06/02/19  222221    300.0        51   OK

P.S.: RowID is the order of rows I got from source data and it can be altered I believe if custom sorting is applied as the data is in Excel. The order of data in example data is changed based on sorting order by Material→Date→Quantity in Spotfire, hence the different row order in table than source data.

Any ideas how to add calculated column?

Upvotes: 2

Views: 1167

Answers (4)

JCS_1006
JCS_1006

Reputation: 1

If you go into your data canvas and add transformation "filter data"

If(Rank(Baserowid(),"asc",[myGroup1])=1,TRUE, FALSE)

it will filter all your duplicates out

Upvotes: 0

BMAnalytics
BMAnalytics

Reputation: 114

You can do this with a data function. This blog post -- https://datashoptalk.com/data-function-properties-part5/ -- explains the basics of TERR data functions. To do so, it uses a data function that identifies duplicate records. Scroll down to where it says...

"With that said, here is a very simple script that I’ll reference throughout the post. This single line of R code that will check for duplicates in a data set using two identifiers that define the granularity of the table. The output is a column called duplicate. duplicate <- duplicated(data.frame(key1, key2))"

So, it's one line of code that identifies duplicate records based on two columns of data. You could easily modify for as many as you needed.

Upvotes: 0

Tyger Guzman
Tyger Guzman

Reputation: 758

You can identify duplicate rows with an expression and use as many or as little columns to check. Here is a great Article

If(Rank(Baserowid(),"asc",[myGroup1])=1,"First row of group","Duplicate")

If(Rank(Baserowid(),"asc",[myGroup1],[myGroup2],[myGroup3])=1,"First row of group","Duplicate")

Upvotes: 0

Gaia Paolini
Gaia Paolini

Reputation: 1452

not sure this logic is watertight, but at the price of first calculating a column called AbsQuantity as Abs([Quantity]) then you could define your flag as

If(Count([RowID]) OVER (Intersect([Material],[Order No.],[AbsQuantity]))=2,"delete","OK")

assuming that if there are two rows with the same Material, Order No. and absolute value of Quantity, they are meant to cancel each other. The reason for calculating the AbsQuantity column is that OVER does not support expressions.

Upvotes: 1

Related Questions