Reputation: 741
In R, I would like to find the number of occurrences for the unique rows of a data frame in the fastest way possible.
I have more than 2 million rows but the data fits in my 16GB-memory machine table and ftable are fast but the number of unique combinations are more than they can handle so I receive an error message.
thanks
Steve
Upvotes: 2
Views: 2539
Reputation: 610
countNbOccurrences = function(leX, leGroData){
return(sum(leX == leGroData))
}
sapply( theRow, countNbOccurrences, leGroData = fullListOfRows)
Upvotes: 0
Reputation: 11946
Use count
from the plyr
package. It avoids combinations that do not occur in the data (contrary to table and the likes).
Upvotes: 3
Reputation: 7561
This problem can be solved using SQL (here I use sqldf package). Sample data from @DWin answer.
#Occurences of rows
sqldf("SELECT speed, dist, COUNT(*) AS N FROM cars2 GROUP BY speed, dist")
#Some statistics of occurences ;)
sqldf("SELECT N,COUNT(N) AS Freq from
(SELECT COUNT(*) AS N FROM cars2 GROUP BY speed,dist)
GROUP BY N")
Upvotes: 1
Reputation: 263352
If the question was to get the number of unique lines:
sum(!duplicated(dfrm))
If the question was to get the unique lines themselves:
dfrm[!duplicated(dfrm), ]
If you want a table of unique combinations then consider this example with the inbuilt dataframe cars:
cars2 <- cars[sample(1:10, 20, replace=TRUE), ] # to make some dups
table(apply(cars2,1,paste, sep=".", collapse="."))
# output #
10.18 10.26 10.34 11.17 4.10 4.2 7.22 7.4 8.16
2 3 3 3 3 1 1 2 2
Upvotes: 1