Reputation: 590
The datatable package in python (https://github.com/h2oai/datatable/) can count the number of unique values in a column, Is there a way to drop the duplicates values with this package or I have to use the slow pandas package?
Upvotes: 3
Views: 904
Reputation: 6560
If you want to find the unique values in a single column, then you can use function dt.unique()
, which takes a column and returns a new column with all unique values from the original:
>>> import datatable as dt
>>> DT = dt.Frame(A=[1, 3, 2, 1, 4, 2, 1], B=list("ABCDEFG"))
>>> dt.unique(DT["A"])
| A
-- + --
0 | 1
1 | 2
2 | 3
3 | 4
[4 rows x 1 column]
If, on the other hand, you have a multi-column Frame and you want to only keep rows with the unique values in one of the columns, then this is equivalent to grouping by that column, and can be approached as such:
>>> from datatable import f, by, first
>>> DT[:, first(f[1:]), by(f[0])]
| A B
-- + -- --
0 | 1 A
1 | 2 C
2 | 3 B
3 | 4 E
[4 rows x 2 columns]
Upvotes: 8