rahram
rahram

Reputation: 590

How to drop duplicates in a python datatable h2oai

The datatable package in python (https://github.com/h2oai/datatable/) can count the number of unique values in a column, Is there a way to drop the duplicates values with this package or I have to use the slow pandas package?

Upvotes: 3

Views: 904

Answers (1)

Pasha
Pasha

Reputation: 6560

If you want to find the unique values in a single column, then you can use function dt.unique(), which takes a column and returns a new column with all unique values from the original:

>>> import datatable as dt
>>> DT = dt.Frame(A=[1, 3, 2, 1, 4, 2, 1], B=list("ABCDEFG"))
>>> dt.unique(DT["A"])
   |  A
-- + --
 0 |  1
 1 |  2
 2 |  3
 3 |  4

[4 rows x 1 column]

If, on the other hand, you have a multi-column Frame and you want to only keep rows with the unique values in one of the columns, then this is equivalent to grouping by that column, and can be approached as such:

>>> from datatable import f, by, first
>>> DT[:, first(f[1:]), by(f[0])]
   |  A  B 
-- + --  --
 0 |  1  A 
 1 |  2  C 
 2 |  3  B 
 3 |  4  E 

[4 rows x 2 columns]

Upvotes: 8

Related Questions