K.Hua
K.Hua

Reputation: 799

Downsize the object memory by subsetting a data frame in R

So I'm using the database from https://www.kaggle.com/jiashenliu/515k-hotel-reviews-data-in-europe/downloads/515k-hotel-reviews-data-in-europe.zip/1 and I don't understand why I can't downsize the object size by subsetting the dataset

df = read.csv('Hotel_Reviews.csv')
object.size(df)

200503848 bytes

object.size(df[sample(1:nrow(df),500),])

157225848 bytes

By taking 0.1% of the data, I only downsized the data to 75%. I don't understand why...

Upvotes: 2

Views: 129

Answers (1)

K.Hua
K.Hua

Reputation: 799

Ok after looking more deeply at it, it seems it's because my data frame was made of factors and even by subsetting, it keeps the empty levels

df = read.csv('Hotel_Reviews.csv',stringsAsFactors = FALSE)
object.size(df)

210584168 bytes

object.size(df[sample(1:nrow(df),500),])

394464 bytes

Upvotes: 2

Related Questions