add-semi-colons
add-semi-colons

Reputation: 18800

R Create new data frame for each unique id

I have created a feature vector (data.frame) that has an id, feat1, feat2, feat3, boolean, but in this data frame there are duplicates of ids, which is done purposefully. What I want to do is as I iterate over this data frame build new data frame per id.

For simplicity lets assume I have following two columns.

          X1         X2      X3
1   000000001 -1.4061361     1
2   000000001 -0.1973846     1
3   000000002 -0.4385071     1
4   000000001 -0.6593677     0
5   000000001 -1.2592415     0
6   000000001 -0.5463655     1
7   000000002  0.4231117     0
8   000000002 -0.1640883     1
9   000000002  0.7157506     0
10  000000002  2.3234110     1

I want to build different data frame based on X1 basically I want to get all the same X1 into their own data frames. I wrote using multiple for loops but It takes super long time since this is a large data set. What is the best way to do this?

Upvotes: 2

Views: 2641

Answers (3)

dmca
dmca

Reputation: 685

It sounds like you want to be able to fit models to each subset of data (and likely extract summaries of the models). You can use broom, dplyr, purrr and tidyr to do this functionally. Here's an example:

library(broom)
library(dplyr)
library(purrr)
library(tidyr)

mtcars %>%
  group_by(cyl) %>%
  nest() %>%
  mutate(model = map(data, lm, formula = mpg ~ disp + hp),
         results = map(model, tidy)) %>%
  unnest(results)

Upvotes: 0

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

As suggested in the comments, use split. If you really want to have new objects created, use split in conjunction with list2env as follows:

## What is in the workspace presently?
ls()
# [1] "mydf"

## This is where most R users would probably stop
split(mydf, mydf$X1)
# $`1`
#   X1         X2 X3
# 1  1 -1.4061361  1
# 2  1 -0.1973846  1
# 4  1 -0.6593677  0
# 5  1 -1.2592415  0
# 6  1 -0.5463655  1
# 
# $`2`
#    X1         X2 X3
# 3   2 -0.4385071  1
# 7   2  0.4231117  0
# 8   2 -0.1640883  1
# 9   2  0.7157506  0
# 10  2  2.3234110  1

The above command creates a list, which is a very convenient format to have if you are going to be doing similar calculations on each list item. Most R users would stop there. If you really need separate objects in your workspace, use list2env:

list2env(split(mydf, mydf$X1), envir=.GlobalEnv)
# <environment: R_GlobalEnv>

## How many objects do we have now?
ls()
# [1] "1"    "2"    "mydf"

Note that these names are not syntactically valid, so you need to use backticks (</code>) to access them. (Or, alternatively,get("1")`).

`1`
#   X1         X2 X3
# 1  1 -1.4061361  1
# 2  1 -0.1973846  1
# 4  1 -0.6593677  0
# 5  1 -1.2592415  0
# 6  1 -0.5463655  1
`2`
#    X1         X2 X3
# 3   2 -0.4385071  1
# 7   2  0.4231117  0
# 8   2 -0.1640883  1
# 9   2  0.7157506  0
# 10  2  2.3234110  1

Upvotes: 3

Hillary Sanders
Hillary Sanders

Reputation: 6047

This uses one for loop - better?

ids <- unique(df$X1)

for(i in 1:length(ids)){
    id <- ids[i]
    mini.df <- data.frame(df[df$X1 == id, ])
    assign(paste("mini.df", i, sep="."), mini.df)
    # or alternatively, if you wanted the data.frames to be assigned by id, 
    # assign(id, mini.df)
}

Upvotes: 1

Related Questions