Ferenc
Ferenc

Reputation: 874

Selecting rows in DataFrame by/after grouping using DataFramesMeta in Julia

I am trying to select certain data rows in a DataFrame using @linq macros:

using DataFrames, DataFramesMeta
df=DataFrame(x = ["a", "a", "a", "b", "b", "b"], 
             y = [1, 2, 3, 2, 3, 4], 
             z = [100, 200, 300, 456, 345, 234])

| Row | x   | y | z   |
|-----|-----|---|-----|
| 1   | "a" | 1 | 100 |
| 2   | "a" | 2 | 200 |
| 3   | "a" | 3 | 300 |
| 4   | "b" | 2 | 456 |
| 5   | "b" | 3 | 345 |
| 6   | "b" | 4 | 234 |

I am trying to select those rows that have the maximum y for a given type of x, that is

| Row | x   | y | z   |
|-----|-----|---|-----|
| 1   | "a" | 3 | 300 |
| 2   | "b" | 4 | 234 |

So, I am grouping by column x and adding a column with the maxima

@linq df |> @by(:x, maxY = maximum(:y)) 

which gives

| Row | x   | maxY |
|-----|-----|------|
| 1   | "a" | 3    |
| 2   | "b" | 4    |

but I don't see how to put the corresponding z entries back in. Probably, it would be join but I don't see how to do that or get the result in another, simple way.

Upvotes: 1

Views: 506

Answers (1)

mmagnuski
mmagnuski

Reputation: 1275

You can do it in one line joining on=[:x,:y] but for this to work you need to name the maximum(:y) column y not maxY:

df2 = @linq df |> by(:x, y=maximum(:y)) |> join(df, on=[:x, :y])

You can later rename that column to the intended maxY:

rename!(df2, :y, :maxY)

Upvotes: 1

Related Questions