Reputation: 680
In SparkR I have a DataFrame data
. It contains time
, game
and id
.
head(data)
then gives ID = 1 4 1 1 215 985 ..., game
= 1 5 1 10 and time 2012-2-1, 2013-9-9, ...
Now game
contains a gametype which is numbers from 1 to 10.
For a given gametype I want to find the minimum time, meaning the first time this game has been played. For gametype 1 I do this
data1 <- filter(data, data$game == 1)
This new data contains all data for gametype 1. To find the minimum time I do this
g <- groupBy(data1, game$time)
first(arrange(g, desc(g$time)))
but this can't run in sparkR. It says "object of type S4 is not subsettable".
Game 1 has been played 2012-01-02, 2013-05-04, 2011-01-04,... I would like to find the minimum-time.
Upvotes: 0
Views: 1498
Reputation: 49
Just to clarify because this is something I keep running into: the error you were getting is probably because you also imported dplyr into your environment. If you would have used SparkR::first(SparkR::arrange(g, SparkR::desc(g$time)))
things would probably have been fine (although obviously the query could've been more efficient).
Upvotes: 0
Reputation: 330353
If all you want is a minimum time
sorting a whole data set doesn't make sense. You can simply use min
:
agg(df, min(df$time))
or for each type of game:
groupBy(df, df$game) %>% agg(min(df$time))
Upvotes: 1
Reputation: 680
By typing
arrange(game, game$time)
I get all of the time
sorted. By taking first
function I get the first entry. If I want the last entry I simply type this
first(arrange(game, desc(game$time)))
Upvotes: 1