Ole Petersen
Ole Petersen

Reputation: 680

To sort a specific column in a DataFrame in SparkR

In SparkR I have a DataFrame data. It contains time, game and id.

head(data)

then gives ID = 1 4 1 1 215 985 ..., game = 1 5 1 10 and time 2012-2-1, 2013-9-9, ... Now game contains a gametype which is numbers from 1 to 10.

For a given gametype I want to find the minimum time, meaning the first time this game has been played. For gametype 1 I do this

data1 <- filter(data, data$game == 1)

This new data contains all data for gametype 1. To find the minimum time I do this

g <- groupBy(data1, game$time)
first(arrange(g, desc(g$time)))

but this can't run in sparkR. It says "object of type S4 is not subsettable".

Game 1 has been played 2012-01-02, 2013-05-04, 2011-01-04,... I would like to find the minimum-time.

Upvotes: 0

Views: 1498

Answers (3)

Niels
Niels

Reputation: 49

Just to clarify because this is something I keep running into: the error you were getting is probably because you also imported dplyr into your environment. If you would have used SparkR::first(SparkR::arrange(g, SparkR::desc(g$time))) things would probably have been fine (although obviously the query could've been more efficient).

Upvotes: 0

zero323
zero323

Reputation: 330353

If all you want is a minimum time sorting a whole data set doesn't make sense. You can simply use min:

agg(df, min(df$time))

or for each type of game:

groupBy(df, df$game) %>% agg(min(df$time))

Upvotes: 1

Ole Petersen
Ole Petersen

Reputation: 680

By typing

arrange(game, game$time)

I get all of the time sorted. By taking first function I get the first entry. If I want the last entry I simply type this

first(arrange(game, desc(game$time)))

Upvotes: 1

Related Questions