Reputation: 235
I am looking on how to store multiple instances of different variables types in R. I tried to use dataframes (and lists) but cannot get it to do what I want. Let me try to show you with an example what I would like to achieve.
Let's say I create a type of data type (a basket) that contains a number and a string, like :
iNbLine = 2
df<-data.frame(Weight=double(iNbLine), Color=character(iNbLine),stringsAsFactors=F)
row.names(df)<-c("apples","pears")
df
Weight Color
apples 0
pears 0
I can now update my data structure as I want. For example :
df$Weight[1]=158
df$Color[1]="green"
df
Weight Color
apples 158 green
pears 0
What I would like to do however is have a higher-level data than contains several of these baskets with additional data(here the price), so I tried this :
iNbBasket =5
df2<-data.frame(Price=double(iNbBasket), Basket=rep(df,iNbBasket))
But this gives me
Error in data.frame(Price = double(iNbBasket), Basket = rep(df, iNbBasket)) : arguments imply differing number of rows: 5, 2
What I would like to be able to do is access the weight of apples of my 2nd basket for example; while keeping the possibility to set the price of the 2nd basket. I hope this is clear enough. In C language, I think I was able to define a new data type (basket) using "struct", which I could then include in another data type but I cannot figure how to do it here.
For @joran this is an attempt to show what I would like :
Baskets
Name Price Names Weight Color
Basket1 250 apples 158 green
pears 32 yellow
Basket2 120 apples 70 green
pears 10 yellow
But being able to access line 3, by something like :
myBasket<-myData[2]
myBasket$Weight[1]
70
and do :
myBasket$Price = 130
Update 1 I looked through lists, S3 variable types, and dplyr. I have to admit I did not understand everything but so far I do not have exactly what I want. I currently do the following
iNbLine = 2
df<-data.frame(Weight=double(iNbLine), Color=c("green","yellow"),stringsAsFactors=F)
row.names(df)<-c("apples","pears")
iNbBasket=3
dfBaskets<-data.frame(Price=double(iNbBasket))
row.names(dfBaskets)=c("Basket1","Basket2","Basket3")
lBasketsContent<-list()
for(i in 1:iNbBasket){
lBasketsContent[[i]]=df
}
This way I can access the price :
iBasket =2
dfBaskets$Price[2] = 150
and any element of a given basket :
lBasketsContent[[2]]$Weight[1] = 300
as well as the basket itself (I pass it to a function in my real case)
dfBasket<-lBasketsContent[[2]]
It is easy to read but requires 2 containers.
Upvotes: 1
Views: 1777
Reputation: 160407
Hadley's tidyr
(with purrr
) provide something like this. Take a look at "tidyr 0.4.0" for a demonstration of complex structures nested within a data.frame cell.
Their examples typically rely on having relevant information in the other cells before populated the others, and even then populating them based on some form of grouping. For example, using mtcars
:
library(dplyr)
library(tidyr)
library(purrr)
mtcars %>%
transmute(model = rownames(mtcars), mpg, cyl, disp, gear) %>%
group_by(cyl)
# Source: local data frame [32 x 5]
# Groups: cyl [3]
# model mpg cyl disp gear
# <chr> <dbl> <dbl> <dbl> <dbl>
# 1 Mazda RX4 21.0 6 160.0 4
# 2 Mazda RX4 Wag 21.0 6 160.0 4
# 3 Datsun 710 22.8 4 108.0 4
# 4 Hornet 4 Drive 21.4 6 258.0 3
# 5 Hornet Sportabout 18.7 8 360.0 3
# 6 Valiant 18.1 6 225.0 3
# 7 Duster 360 14.3 8 360.0 3
# 8 Merc 240D 24.4 4 146.7 4
# 9 Merc 230 22.8 4 140.8 4
# 10 Merc 280 19.2 6 167.6 4
# # ... with 22 more rows
If we call nest()
on a grouping, you can see how things are compacted a bit:
quux1 <- mtcars %>%
transmute(model = rownames(mtcars), mpg, cyl, disp, gear) %>%
group_by(cyl) %>%
nest()
quux1
# # A tibble: 3 x 2
# cyl data
# <dbl> <list>
# 1 6 <tibble [7 x 4]>
# 2 4 <tibble [11 x 4]>
# 3 8 <tibble [14 x 4]>
quux1$data[[1]]
# # A tibble: 7 x 4
# model mpg disp gear
# <chr> <dbl> <dbl> <dbl>
# 1 Mazda RX4 21.0 160.0 4
# 2 Mazda RX4 Wag 21.0 160.0 4
# 3 Hornet 4 Drive 21.4 258.0 3
# 4 Valiant 18.1 225.0 3
# 5 Merc 280 19.2 167.6 4
# 6 Merc 280C 17.8 167.6 4
# 7 Ferrari Dino 19.7 145.0 5
You can do some processing on this, dplyr
-style:
quux2 <- mtcars %>%
transmute(model = rownames(mtcars), mpg, cyl, disp, gear) %>%
group_by(cyl) %>%
nest() %>%
mutate(mpg2 = purrr::map(data, ~ lm(mpg ~ disp + gear, data = .)))
quux2
# # A tibble: 3 x 3
# cyl data mpg2
# <dbl> <list> <list>
# 1 6 <tibble [7 x 4]> <S3: lm>
# 2 4 <tibble [11 x 4]> <S3: lm>
# 3 8 <tibble [14 x 4]> <S3: lm>
And deal with the models individually:
summary(quux2$mpg2[[2]])
# Call:
# lm(formula = mpg ~ disp + gear, data = .)
# Residuals:
# Min 1Q Median 3Q Max
# -3.2691 -1.7130 0.0708 1.7617 3.4351
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 30.77234 7.33123 4.197 0.00301 **
# disp -0.13189 0.03094 -4.263 0.00275 **
# gear 2.38529 1.54132 1.548 0.16032
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Residual standard error: 2.623 on 8 degrees of freedom
# Multiple R-squared: 0.7294, Adjusted R-squared: 0.6618
# F-statistic: 10.78 on 2 and 8 DF, p-value: 0.005361
A more robust use of this would deal with the models programmatically, of course, but this is just a start.
NB: I am not suggesting that mpg ~ disp + gear
is a reasonable model :-)
Update 1
How about this:
Start with "default" basket contents, a hybrid list/data.frame:
df <- list(Price = 0,
Contents = data.frame(Names = c("apples", "pears"),
Weight = rep(0, 2L),
Color = c("green","yellow"),
stringsAsFactors = F)
)
Create a list of three baskets (three customers?):
nBaskets <- 3L
# start with 3 empty baskets
lBaskets <- replicate(nBaskets, df, simplify = FALSE)
str(lBaskets)
# List of 3
# $ :List of 2
# ..$ Price : num 0
# ..$ Contents:'data.frame': 2 obs. of 3 variables:
# .. ..$ Names : chr [1:2] "apples" "pears"
# .. ..$ Weight: num [1:2] 0 0
# .. ..$ Color : chr [1:2] "green" "yellow"
# $ :List of 2
# ..$ Price : num 0
# ..$ Contents:'data.frame': 2 obs. of 3 variables:
# .. ..$ Names : chr [1:2] "apples" "pears"
# .. ..$ Weight: num [1:2] 0 0
# .. ..$ Color : chr [1:2] "green" "yellow"
# $ :List of 2
# ..$ Price : num 0
# ..$ Contents:'data.frame': 2 obs. of 3 variables:
# .. ..$ Names : chr [1:2] "apples" "pears"
# .. ..$ Weight: num [1:2] 0 0
# .. ..$ Color : chr [1:2] "green" "yellow"
Now, customer 2 wants to buy something:
cust <- 2
lBaskets[[ cust ]]$Contents$Weight[1] <- 300
lBaskets[[ cust ]]$Price <- 150
lBaskets[[ cust ]]
# $Price
# [1] 150
# $Contents
# Names Weight Color
# 1 apples 300 green
# 2 pears 0 yellow
Without getting into S4 objects (perhaps over-engineered for what you are trying to do), I think this is the most straight-forward way. If you want/need to make a quick reference to a specific customer's Contents
and reassign it back into the list, that's certainly doable but not required.
Upvotes: 1
Reputation: 459
Use lists
. Lists are generic vectors able to contain other objects
Using a data.table
for example:
> library(data.table)
> baskets = data.table(
'name'=c('basket1','basket2'),
'price'=c(250,120),
'names'=list( list('apples','pears') , list('apples','pears') ),
'weight'=list( list(158,32) , list(70,10) ),
'color'=list( list('green','yellow') , list('green','yellow'))
)
> baskets
name price names weight color
1: basket1 250 <list> <list> <list>
2: basket2 120 <list> <list> <list>
>
Grabbing the first row information
> baskets[1][['price']]
[1] 250
> baskets[1][['names']][[1]][[2]]
[1] "pears"
> baskets[1][['weight']][[1]][[2]]
[1] 32
Upvotes: 1
Reputation: 3597
We need to convert the first data.frame to matrix and use byrow=TRUE
iNbLine = 2
DF<-data.frame(Weight=double(iNbLine), Color=character(iNbLine),stringsAsFactors=F)
row.names(DF)<-c("apples","pears")
DF
DF$Weight[1]=158
DF$Color[1]="green"
DF$Weight[2]=200
DF$Color[2]="red"
iNbBasket =5
DF_MultiLevel<-data.frame(Price=double(iNbBasket), Basket= matrix(rep(DF,iNbBasket),nrow=iNbBasket,byrow=TRUE) )
#> DF_MultiLevel
# Price Basket.1 Basket.2
#1 0 158, 200 green, red
#2 0 158, 200 green, red
#3 0 158, 200 green, red
#4 0 158, 200 green, red
#5 0 158, 200 green, red
Upvotes: 0