Reputation: 23
Here Alphabets are IDs.
data=DataFrame(["A" 2 3 4; "B" 1 2 3;"C" 2 1 2;"D" 2 4 9],:auto)
4 rows × 4 columns
x1 x2 x3 x4
Any Any Any Any
1 A 2 3 4
2 B 1 2 3
3 C 2 1 2
4 D 2 4 9
Suppose my sampled data has;
2 rows × 4 columns
x1 x2 x3 x4
Any Any Any Any
1 D 2 4 9
2 A 2 3 4
I want to get remaining data that excludes rows with D and A. I can get it simply by selecting columns 2 and 3. But I need other method for large datasets.
Upvotes: 2
Views: 309
Reputation: 69949
But I need other method for large datasets.
The size of the data set does not matter here. If you have any row selector, call it rows
that you used to select rows with "D"
and "A"
then just use Not(rows)
selector to select all the remaining rows.
If you do not have a row selector that you used to create a first data frame here is an efficient pattern that performs the required selection:
data[(!in(Set(["A", "D"]))).(data.x2), :]
If you want something simpler to understand but less efficient you can use:
data[[!(v in ["A", "D"]) for v in df.x2], :]
As a side comment (since you want to work with large data sets) - it is more efficient to have your columns have element type more specific than Any
. You can do column type narrowing by writing data = identity.(data)
.
If something from what I have written above would be not clear please ask in the comment for explanation.
Upvotes: 3