Reputation: 23

Get remaining dataframe after subsetting

Here Alphabets are IDs.

data=DataFrame(["A" 2 3 4; "B" 1 2 3;"C" 2 1 2;"D" 2 4 9],:auto)

4 rows × 4 columns
x1  x2  x3  x4
Any Any Any Any
1   A   2   3   4
2   B   1   2   3
3   C   2   1   2
4   D   2   4   9

Suppose my sampled data has;

2 rows × 4 columns
x1  x2  x3  x4
Any Any Any Any
1   D   2   4   9
2   A   2   3   4

I want to get remaining data that excludes rows with D and A. I can get it simply by selecting columns 2 and 3. But I need other method for large datasets.

Upvotes: 2

Answers (1)

Bogumił Kamiński

Reputation: 69949

But I need other method for large datasets.

The size of the data set does not matter here. If you have any row selector, call it rows that you used to select rows with "D" and "A" then just use Not(rows) selector to select all the remaining rows.

If you do not have a row selector that you used to create a first data frame here is an efficient pattern that performs the required selection:

data[(!in(Set(["A", "D"]))).(data.x2), :]

If you want something simpler to understand but less efficient you can use:

data[[!(v in ["A", "D"]) for v in df.x2], :]

As a side comment (since you want to work with large data sets) - it is more efficient to have your columns have element type more specific than Any. You can do column type narrowing by writing data = identity.(data).

If something from what I have written above would be not clear please ask in the comment for explanation.

Upvotes: 3

Get remaining dataframe after subsetting

Answers (1)

Related Questions