Reputation: 874
I am trying to select certain data rows in a DataFrame
using @linq
macros:
using DataFrames, DataFramesMeta
df=DataFrame(x = ["a", "a", "a", "b", "b", "b"],
y = [1, 2, 3, 2, 3, 4],
z = [100, 200, 300, 456, 345, 234])
| Row | x | y | z |
|-----|-----|---|-----|
| 1 | "a" | 1 | 100 |
| 2 | "a" | 2 | 200 |
| 3 | "a" | 3 | 300 |
| 4 | "b" | 2 | 456 |
| 5 | "b" | 3 | 345 |
| 6 | "b" | 4 | 234 |
I am trying to select those rows that have the maximum y
for a given type of x
, that is
| Row | x | y | z |
|-----|-----|---|-----|
| 1 | "a" | 3 | 300 |
| 2 | "b" | 4 | 234 |
So, I am grouping by column x
and adding a column with the maxima
@linq df |> @by(:x, maxY = maximum(:y))
which gives
| Row | x | maxY |
|-----|-----|------|
| 1 | "a" | 3 |
| 2 | "b" | 4 |
but I don't see how to put the corresponding z entries back in. Probably, it would be join
but I don't see how to do that or get the result in another, simple way.
Upvotes: 1
Views: 506
Reputation: 1275
You can do it in one line joining on=[:x,:y]
but for this to work you need to name the maximum(:y)
column y
not maxY
:
df2 = @linq df |> by(:x, y=maximum(:y)) |> join(df, on=[:x, :y])
You can later rename that column to the intended maxY
:
rename!(df2, :y, :maxY)
Upvotes: 1