mlegge
mlegge

Reputation: 6913

Retaining 'by' variables with by function

When splitting a dataframe with by, the 'by' variables are printed, but not retained as variables.

    data(iris)
    dflist <- by(iris[,1:4], iris[,"Species"], data.frame)
    head(dflist[[1]])

      Sepal.Length Sepal.Width Petal.Length Petal.Width
    1          5.1         3.5          1.4         0.2
    2          4.9         3.0          1.4         0.2
    3          4.7         3.2          1.3         0.2
    4          4.6         3.1          1.5         0.2
    5          5.0         3.6          1.4         0.2

Is it possible to retain the variable as a column var as below?

        Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
    1            5.1         3.5          1.4         0.2     setosa
    2            4.9         3.0          1.4         0.2     setosa
    3            4.7         3.2          1.3         0.2     setosa
    4            4.6         3.1          1.5         0.2     setosa
    5            5.0         3.6          1.4         0.2     setosa

Or is there a better way to group the data by certain variables into a list object?

Upvotes: 0

Views: 43

Answers (3)

jed
jed

Reputation: 615

Is this what you want?

species_list <- split(iris,iris$Species,drop=FALSE)

Upvotes: 2

MrFlick
MrFlick

Reputation: 206466

If you want to keep the sepecies column, then you just have to ask for it. Right now you are explicitly removing it by only selecting columns 1:4.

dflist <- by(iris[,1:5], iris[,"Species"], data.frame)
head(dflist[[1]])

#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa
# 2          4.9         3.0          1.4         0.2  setosa
# 3          4.7         3.2          1.3         0.2  setosa
# 4          4.6         3.1          1.5         0.2  setosa
# 5          5.0         3.6          1.4         0.2  setosa
# 6          5.4         3.9          1.7         0.4  setosa

or at this point, since you are just splitting the data and not applying a function

dflist <- split(iris, iris[,"Species"])

would work just as well.

Upvotes: 4

tkmckenzie
tkmckenzie

Reputation: 1363

split might do what you're looking for:

split(iris, iris$Species)
# $setosa
#    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1           5.1         3.5          1.4         0.2  setosa
# 2           4.9         3.0          1.4         0.2  setosa
# 3           4.7         3.2          1.3         0.2  setosa
# 4           4.6         3.1          1.5         0.2  setosa
# 5           5.0         3.6          1.4         0.2  setosa
# ...
# $versicolor
#     Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
# 51           7.0         3.2          4.7         1.4 versicolor
# 52           6.4         3.2          4.5         1.5 versicolor
# 53           6.9         3.1          4.9         1.5 versicolor
# 54           5.5         2.3          4.0         1.3 versicolor
# 55           6.5         2.8          4.6         1.5 versicolor
# ...

Upvotes: 4

Related Questions