Reputation: 3757
I'm not at all familiar with R. I have basic competency using MATLAB which I do like a lot.
My current task involves gathering nice statistical data and analyzing it. I have scraped my data from JSON into a data frame by using fromJSON in the RJSONIO lib. Then I removed NULL
values with
Stats <- lapply(Stats, function(x) {
x[sapply(x, is.null)] <- NA
unlist(x)})
then called
DF<-do.call("rbind", Stats)
to get the JSON values packed into a data frame. This left me with full atomic vectors, so this was cleaned up using
DF<-as.data.frame(DF)
Now I'm left with a DF where I'd like to be able to perform query-esque (SQL) calculations, ie.. My research has led me to a library called sqldf, but it seems to have many dependencies that I can't get running on my machine. Still looking for a solution.
List all people who are on team "NYI"
or
Find total number of goals for team
or even most helpful
Lookup data by name or other key value
Sample data frame:
_ Name Team Opponent
167 Matt Carkner NYI PHI
168 Keith Ballard MIN FLA
169 Willie Mitchell FLA MIN
170 Rob Scuderi PIT BOS
171 Nate Prosser MIN FLA
172 Nick Schultz PHI NYI
Upvotes: 1
Views: 406
Reputation: 3757
SQLDF turned out to be the PERFECT solution to my type of question.
I simply installed the package from the R console
install.packages("sqldf")
Since I'm running on an Apple system, some Xcode dependencies needed installing, but Software Update took care of that for me autonomously.
Then I simply used the
library(sqldf)
command to begin running SQL queries on my data frames.
Use is very much so similar to standard SQL, return is new DF with requested attributes. Wrapped up in a view tag makes it a perfect tool now.
View(sqldf("SELECT * from skaters WHERE Team='PHI'"))
Upvotes: 1
Reputation: 3501
Suggest data.table
package.
# toy data
# I presume your first column of data is number of "goals"
df <- structure(list(Goals = 167:172, Name = structure(c(2L, 1L, 6L,
5L, 3L, 4L), .Label = c("Keith Ballard", "Matt Carkner", "Nate Prosser",
"Nick Schultz", "Rob Scuderi", "Willie Mitchell"), class = "factor"),
Team = structure(c(3L, 2L, 1L, 5L, 2L, 4L), .Label = c("FLA",
"MIN", "NYI", "PHI", "PIT"), class = "factor"), Opponent = structure(c(5L,
2L, 3L, 1L, 2L, 4L), .Label = c("BOS", "FLA", "MIN", "NYI",
"PHI"), class = "factor")), .Names = c("Goals", "Name", "Team",
"Opponent"), class = c("data.table", "data.frame"), row.names = c(NA,
-6L), .internal.selfref = <pointer: 0x001924a0>, index = structure(integer(0), "`__Team`" = c(3L,
2L, 5L, 1L, 6L, 4L)))
illustrations on a few operations
setDT(df) # convert data frame to data.table
df[Team == "NYI",] # there's faster approach by first keying the concerned column
# you get
Goals Name Team Opponent
1: 167 Matt Carkner NYI PHI
df[,list(ttl_goals = sum(Goals)), by=Name] # to get total goals per player
# you get
Name ttl_goals
1: Matt Carkner 167
2: Keith Ballard 168
3: Willie Mitchell 169
4: Rob Scuderi 170
5: Nate Prosser 171
6: Nick Schultz 172
Upvotes: 2