Steve Harman
Steve Harman

Reputation: 741

Finding the number of occurrences for rows

In R, I would like to find the number of occurrences for the unique rows of a data frame in the fastest way possible.

I have more than 2 million rows but the data fits in my 16GB-memory machine table and ftable are fast but the number of unique combinations are more than they can handle so I receive an error message.

thanks

Steve

Upvotes: 2

Views: 2539

Answers (4)

Aname
Aname

Reputation: 610

countNbOccurrences = function(leX, leGroData){
    return(sum(leX == leGroData))
}

sapply( theRow, countNbOccurrences, leGroData = fullListOfRows)

Upvotes: 0

Nick Sabbe
Nick Sabbe

Reputation: 11946

Use count from the plyr package. It avoids combinations that do not occur in the data (contrary to table and the likes).

Upvotes: 3

Wojciech Sobala
Wojciech Sobala

Reputation: 7561

This problem can be solved using SQL (here I use sqldf package). Sample data from @DWin answer.

#Occurences of rows
sqldf("SELECT speed, dist, COUNT(*) AS N FROM cars2 GROUP BY speed, dist")
#Some statistics of occurences ;)
sqldf("SELECT N,COUNT(N) AS Freq from 
           (SELECT COUNT(*) AS N FROM cars2 GROUP BY speed,dist) 
       GROUP BY N")

Upvotes: 1

IRTFM
IRTFM

Reputation: 263352

If the question was to get the number of unique lines:

sum(!duplicated(dfrm))

If the question was to get the unique lines themselves:

dfrm[!duplicated(dfrm), ]

If you want a table of unique combinations then consider this example with the inbuilt dataframe cars:

cars2 <- cars[sample(1:10, 20, replace=TRUE), ]  # to make some dups
table(apply(cars2,1,paste, sep=".", collapse="."))

# output #
10.18 10.26 10.34 11.17  4.10   4.2  7.22   7.4  8.16 
    2     3     3     3     3     1     1     2     2 

Upvotes: 1

Related Questions