Djiggy
Djiggy

Reputation: 235

multiple level Data frame

I am looking on how to store multiple instances of different variables types in R. I tried to use dataframes (and lists) but cannot get it to do what I want. Let me try to show you with an example what I would like to achieve.

Let's say I create a type of data type (a basket) that contains a number and a string, like :

iNbLine = 2
df<-data.frame(Weight=double(iNbLine), Color=character(iNbLine),stringsAsFactors=F)
row.names(df)<-c("apples","pears")
df
       Weight Color
apples      0      
pears       0    

I can now update my data structure as I want. For example :

df$Weight[1]=158
df$Color[1]="green"
df
       Weight Color
apples    158 green
pears       0    

What I would like to do however is have a higher-level data than contains several of these baskets with additional data(here the price), so I tried this :

iNbBasket =5
df2<-data.frame(Price=double(iNbBasket), Basket=rep(df,iNbBasket))

But this gives me

Error in data.frame(Price = double(iNbBasket), Basket = rep(df, iNbBasket)) : arguments imply differing number of rows: 5, 2

What I would like to be able to do is access the weight of apples of my 2nd basket for example; while keeping the possibility to set the price of the 2nd basket. I hope this is clear enough. In C language, I think I was able to define a new data type (basket) using "struct", which I could then include in another data type but I cannot figure how to do it here.

For @joran this is an attempt to show what I would like :

                       Baskets      
Name    Price   Names   Weight  Color
Basket1 250     apples  158     green
                pears   32      yellow
Basket2 120     apples  70      green
                pears   10      yellow

But being able to access line 3, by something like :

myBasket<-myData[2]
myBasket$Weight[1]
70

and do :

myBasket$Price = 130

Update 1 I looked through lists, S3 variable types, and dplyr. I have to admit I did not understand everything but so far I do not have exactly what I want. I currently do the following

iNbLine = 2
df<-data.frame(Weight=double(iNbLine), Color=c("green","yellow"),stringsAsFactors=F)
row.names(df)<-c("apples","pears")

iNbBasket=3
dfBaskets<-data.frame(Price=double(iNbBasket))
row.names(dfBaskets)=c("Basket1","Basket2","Basket3")

lBasketsContent<-list()
for(i in 1:iNbBasket){
    lBasketsContent[[i]]=df
}

This way I can access the price :

iBasket =2
dfBaskets$Price[2] = 150

and any element of a given basket :

lBasketsContent[[2]]$Weight[1] = 300

as well as the basket itself (I pass it to a function in my real case)

dfBasket<-lBasketsContent[[2]]

It is easy to read but requires 2 containers.

Upvotes: 1

Views: 1777

Answers (3)

r2evans
r2evans

Reputation: 160407

Hadley's tidyr (with purrr) provide something like this. Take a look at "tidyr 0.4.0" for a demonstration of complex structures nested within a data.frame cell.

Their examples typically rely on having relevant information in the other cells before populated the others, and even then populating them based on some form of grouping. For example, using mtcars:

library(dplyr)
library(tidyr)
library(purrr)

mtcars %>%
  transmute(model = rownames(mtcars), mpg, cyl, disp, gear) %>%
  group_by(cyl)
# Source: local data frame [32 x 5]
# Groups: cyl [3]
#                model   mpg   cyl  disp  gear
#                <chr> <dbl> <dbl> <dbl> <dbl>
# 1          Mazda RX4  21.0     6 160.0     4
# 2      Mazda RX4 Wag  21.0     6 160.0     4
# 3         Datsun 710  22.8     4 108.0     4
# 4     Hornet 4 Drive  21.4     6 258.0     3
# 5  Hornet Sportabout  18.7     8 360.0     3
# 6            Valiant  18.1     6 225.0     3
# 7         Duster 360  14.3     8 360.0     3
# 8          Merc 240D  24.4     4 146.7     4
# 9           Merc 230  22.8     4 140.8     4
# 10          Merc 280  19.2     6 167.6     4
# # ... with 22 more rows

If we call nest() on a grouping, you can see how things are compacted a bit:

quux1 <- mtcars %>%
  transmute(model = rownames(mtcars), mpg, cyl, disp, gear) %>%
  group_by(cyl) %>%
  nest()
quux1
# # A tibble: 3 x 2
#     cyl              data
#   <dbl>            <list>
# 1     6  <tibble [7 x 4]>
# 2     4 <tibble [11 x 4]>
# 3     8 <tibble [14 x 4]>
quux1$data[[1]]
# # A tibble: 7 x 4
#            model   mpg  disp  gear
#            <chr> <dbl> <dbl> <dbl>
# 1      Mazda RX4  21.0 160.0     4
# 2  Mazda RX4 Wag  21.0 160.0     4
# 3 Hornet 4 Drive  21.4 258.0     3
# 4        Valiant  18.1 225.0     3
# 5       Merc 280  19.2 167.6     4
# 6      Merc 280C  17.8 167.6     4
# 7   Ferrari Dino  19.7 145.0     5

You can do some processing on this, dplyr-style:

quux2 <- mtcars %>%
  transmute(model = rownames(mtcars), mpg, cyl, disp, gear) %>%
  group_by(cyl) %>%
  nest() %>%
  mutate(mpg2 = purrr::map(data, ~ lm(mpg ~ disp + gear, data = .)))
quux2
# # A tibble: 3 x 3
#     cyl              data     mpg2
#   <dbl>            <list>   <list>
# 1     6  <tibble [7 x 4]> <S3: lm>
# 2     4 <tibble [11 x 4]> <S3: lm>
# 3     8 <tibble [14 x 4]> <S3: lm>

And deal with the models individually:

summary(quux2$mpg2[[2]])
# Call:
# lm(formula = mpg ~ disp + gear, data = .)
# Residuals:
#     Min      1Q  Median      3Q     Max 
# -3.2691 -1.7130  0.0708  1.7617  3.4351 
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)   
# (Intercept) 30.77234    7.33123   4.197  0.00301 **
# disp        -0.13189    0.03094  -4.263  0.00275 **
# gear         2.38529    1.54132   1.548  0.16032   
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Residual standard error: 2.623 on 8 degrees of freedom
# Multiple R-squared:  0.7294,  Adjusted R-squared:  0.6618 
# F-statistic: 10.78 on 2 and 8 DF,  p-value: 0.005361

A more robust use of this would deal with the models programmatically, of course, but this is just a start.

NB: I am not suggesting that mpg ~ disp + gear is a reasonable model :-)

Update 1

How about this:

Start with "default" basket contents, a hybrid list/data.frame:

df <- list(Price = 0,
           Contents = data.frame(Names = c("apples", "pears"),
                                 Weight = rep(0, 2L),
                                 Color = c("green","yellow"),
                                 stringsAsFactors = F)
           )

Create a list of three baskets (three customers?):

nBaskets <- 3L
# start with 3 empty baskets
lBaskets <- replicate(nBaskets, df, simplify = FALSE)
str(lBaskets)
# List of 3
#  $ :List of 2
#   ..$ Price   : num 0
#   ..$ Contents:'data.frame':  2 obs. of  3 variables:
#   .. ..$ Names : chr [1:2] "apples" "pears"
#   .. ..$ Weight: num [1:2] 0 0
#   .. ..$ Color : chr [1:2] "green" "yellow"
#  $ :List of 2
#   ..$ Price   : num 0
#   ..$ Contents:'data.frame':  2 obs. of  3 variables:
#   .. ..$ Names : chr [1:2] "apples" "pears"
#   .. ..$ Weight: num [1:2] 0 0
#   .. ..$ Color : chr [1:2] "green" "yellow"
#  $ :List of 2
#   ..$ Price   : num 0
#   ..$ Contents:'data.frame':  2 obs. of  3 variables:
#   .. ..$ Names : chr [1:2] "apples" "pears"
#   .. ..$ Weight: num [1:2] 0 0
#   .. ..$ Color : chr [1:2] "green" "yellow"

Now, customer 2 wants to buy something:

cust <- 2
lBaskets[[ cust ]]$Contents$Weight[1] <- 300
lBaskets[[ cust ]]$Price <- 150
lBaskets[[ cust ]]
# $Price
# [1] 150
# $Contents
#    Names Weight  Color
# 1 apples    300  green
# 2  pears      0 yellow

Without getting into S4 objects (perhaps over-engineered for what you are trying to do), I think this is the most straight-forward way. If you want/need to make a quick reference to a specific customer's Contents and reassign it back into the list, that's certainly doable but not required.

Upvotes: 1

Phillip Stich
Phillip Stich

Reputation: 459

Use lists. Lists are generic vectors able to contain other objects

Using a data.table for example:

> library(data.table)
> baskets = data.table( 
    'name'=c('basket1','basket2'),
    'price'=c(250,120),
    'names'=list( list('apples','pears') , list('apples','pears') ),
    'weight'=list( list(158,32) , list(70,10) ),
    'color'=list( list('green','yellow') , list('green','yellow'))
)

> baskets
      name price  names weight  color
1: basket1   250 <list> <list> <list>
2: basket2   120 <list> <list> <list>
> 

Grabbing the first row information

> baskets[1][['price']]
[1] 250
> baskets[1][['names']][[1]][[2]]
[1] "pears"
> baskets[1][['weight']][[1]][[2]]
[1] 32

Upvotes: 1

Silence Dogood
Silence Dogood

Reputation: 3597

We need to convert the first data.frame to matrix and use byrow=TRUE

iNbLine = 2
DF<-data.frame(Weight=double(iNbLine), Color=character(iNbLine),stringsAsFactors=F)
row.names(DF)<-c("apples","pears")
DF

DF$Weight[1]=158
DF$Color[1]="green"
DF$Weight[2]=200
DF$Color[2]="red"


iNbBasket =5
DF_MultiLevel<-data.frame(Price=double(iNbBasket), Basket= matrix(rep(DF,iNbBasket),nrow=iNbBasket,byrow=TRUE) )

#> DF_MultiLevel
#  Price Basket.1   Basket.2
#1     0 158, 200 green, red
#2     0 158, 200 green, red
#3     0 158, 200 green, red
#4     0 158, 200 green, red
#5     0 158, 200 green, red

Upvotes: 0

Related Questions