RobertP.
RobertP.

Reputation: 285

r - subsetting dataframe creates factors

I have a huge data frame (call it huge) I would like to split up in two by row number. Though, I notice that the way I'd do it makes the resulting subsets large factors instead of data frames.

list1 <- huge[c(1:8175),]
list2 <- huge[c(8176:nrow(huge),]

class(list1)
[1] "factor"

Can someone explain to me why it is like that, and how do I prevent that?

Upvotes: 1

Views: 129

Answers (1)

www
www

Reputation: 39154

It is likely that you subset a one-column data frame. Considering the following example.

# Create an example data frame
dt <- data.frame(a = 1:5, b = letters[1:5])
dt

#   a b
# 1 1 a
# 2 2 b
# 3 3 c
# 4 4 d
# 5 5 e

str(dt)

# 'data.frame': 5 obs. of  2 variables:
#  $ a: int  1 2 3 4 5
#  $ b: Factor w/ 5 levels "a","b","c","d",..: 1 2 3 4 5

# Subset the data frame
list1 <- dt[1:2, ]
list2 <- dt[3:nrow(dt), ]

class(list1)
# [1] "data.frame"

The code to subset dt works well. However, when I created a one-column data frame from dt and subset it, you can see that the output automatically becomes a vector.

# Create a one-column data frame
dt2 <- dt[, 2, drop = FALSE]

# Subset the data frame
list3 <- dt2[1:2, ]
list4 <- dt2[3:nrow(dt2), ]

class(list3)
# [1] "factor"
list3
# [1] a b
# Levels: a b c d e

The solution would be add drop = FALSE when subsetting the data frame to keep the output as a data frame.

# Subset the data frame
list5 <- dt2[1:2, , drop = FALSE]
class(list5)
# [1] "data.frame"

Upvotes: 2

Related Questions