Reputation: 629
I have a large dataset I am reading in R
I want to apply the Unique()
function on it so I can work with it better, but when I try to do so, I get this prompted:
clients <- unique(clients)
Error: cannot allocate vector of size 27.9 Mb
So I am trying to apply this function part by part by doing this:
clientsmd<-data.frame()
n<-7316738 #Amount of observations in the dataset
t<-0
for(i in 1:200){
clientsm<-clients[1+(t*round((n/200))):(t+1)*round((n/200)),]
clientsm<-unique(clientsm)
clientsmd<-rbind(clientsm)
t<-(t+1) }
But I get this:
Error in `[.default`(xj, i) : subscript too large for 32-bit R
I have been told that I could do this easier with packages such as "ff" or "bigmemory" (or any other) but I don't know how to use them for this purpose.
I'd thank any kind of orientation whether is to tell me why my code won't work or to say me how could I take advantage of this packages.
Upvotes: 0
Views: 867
Reputation: 3833
increase your memory limit like below and then try executing.
memory.limit(4000) ## windows specific command
Upvotes: 1
Reputation: 1102
Is clients a data.frame of data.table? data.table can handle quite large amounts of data compared to data.frame
library(data.table)
clients<-data.table(clients)
clientsUnique<-unique(clients)
or
duplicateIndex <-duplicated(clients)
will give rows that are duplicates.
Upvotes: 1
Reputation: 1022
You could use distinct function from dplyr package .
function - df %>% distinct(ID)
where ID is something unique in your dataframe .
Upvotes: 0