Jakob
Jakob

Reputation: 163

Create multiple data.frames from one - based on selection vector

I have a data.frame like the date stored in the following dput():

df <- structure(list(id = 1:10, obs_2008_1 = c(1152L, 2673L, 3727L, 
                                           4392L, 1342L, 7423L, 3179L, 6519L, 3524L, 7935L), obs_2008_2 = c(3110L, 
                                                                                                            1268L, 1757L, 9966L, 2611L, 3916L, 8972L, 3493L, 5538L, 7159L
                                           ), obs_2009_1 = c(6546L, 4666L, 9772L, 9982L, 6120L, 6909L, 1416L, 
                                                             1776L, 9186L, 7163L), obs_2009_2 = c(1592L, 4568L, 5582L, 9826L, 
                                                                                                  7420L, 5845L, 2629L, 6596L, 1436L, 8102L), obs_2010_1 = c(4829L, 
                                                                                                                                                            9190L, 4061L, 2007L, 8453L, 2670L, 4311L, 2018L, 2008L, 8303L
                                                                                                  ), obs_2010_2 = c(7051L, 4823L, 4350L, 4406L, 2626L, 5556L, 6716L, 
                                                                                                                    7597L, 6405L, 1700L), obs_2011_1 = c(4401L, 8174L, 9138L, 8296L, 
                                                                                                                                                         5538L, 2565L, 7245L, 5036L, 8277L, 8421L), obs_2011_2 = c(5546L, 
                                                                                                                                                                                                                   8311L, 9714L, 1428L, 3029L, 9221L, 7116L, 4620L, 4028L, 2223L
                                                                                                                                                         ), obs_2012_1 = c(8000L, 7778L, 4106L, 2810L, 3695L, 5910L, 7268L, 
                                                                                                                                                                           5813L, 9052L, 2032L), obs_2012_2 = c(9526L, 8099L, 4806L, 7093L, 
                                                                                                                                                                                                                9002L, 1801L, 3529L, 8401L, 3414L, 1154L), obs_2013_1 = c(5846L, 
                                                                                                                                                                                                                                                                          9976L, 6592L, 3463L, 1996L, 2860L, 9272L, 1617L, 4945L, 6295L
                                                                                                                                                                                                                ), obs_2013_2 = c(2504L, 8277L, 7397L, 9075L, 5465L, 2886L, 6587L, 
                                                                                                                                                                                                                                  7687L, 4397L, 6353L), obs_2014_1 = c(5215L, 3211L, 8115L, 5154L, 
                                                                                                                                                                                                                                                                       1607L, 9160L, 8967L, 4313L, 6589L, 3741L), obs_2014_2 = c(2288L, 
                                                                                                                                                                                                                                                                                                                                 5682L, 4552L, 3747L, 8444L, 8711L, 9595L, 6814L, 9314L, 9594L
                                                                                                                                                                                                                                                                       ), obs_2015_1 = c(2288L, 5682L, 4552L, 3747L, 8444L, 8711L, 9595L, 
                                                                                                                                                                                                                                                                                         6814L, 9314L, 9594L), obs_2015_2 = c(7558L, 8192L, 5014L, 1002L, 
                                                                                                                                                                                                                                                                                                                              9582L, 2138L, 5739L, 8448L, 7849L, 7607L)), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                                                                                                                                                                                                                              -10L))

It is basically yearly observations for regions 1:10. In the real data, there are much more than two columns per year. In a next step, I want to analyse development over time. Once for the whole period, and then for two-year steps in between. The tricky part is, that the data.frame is updated every year, so it get's "wider" on a regular basis.

The following code gives me a selection vector b with the id and all the years I want to analyse:

a <- colnames(df)
b <- str_remove_all(a[c(1, seq(2, length(a), 4), length(a))], "obs_")
b <- str_remove_all(b, "_[0-9]")

So in this case, b prints:

[1] "id"   "2008" "2010" "2012" "2014" "2015"

In a next step, I want to come up with multiple dataframes or a list with multiple dataframes that contain a selection based on my selection vector:

Ideally, the resulting data.frames would also include the range in their names, for example df_2008_2010, df_2010_2012,...

With this list, I could do my calculations for the development between every two years and between the full period. I made a few attempts with a loop, but they were not very promising.

Thanks for any help and suggestions!

Upvotes: 0

Views: 42

Answers (2)

cdalitz
cdalitz

Reputation: 1277

A general advice in data modeling is to define the structures in such a way that they will not change over time. The number of columns in your data frame should thus be fixed and when you add new data, rows should be added. In data base lingo, this means converting your model into the "first normal form".

With your data, this would mean to define a column year holding the year of the measurement. Thn you can extract all data from one year with

df(df$year=2015,)

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 389055

You can try this :

b <- b[-1]
cols <- names(df)

lapply(seq_along(b), function(i) {
  if(i == length(b)) inds <- b[c(i, 1)] else inds <- b[i:(i + 1)]
  cbind(df[1], df[grep(paste0(inds, collapse = '|'), cols)])
}) -> list_df

This returns :

list_df

[[1]]
#   id obs_2008_1 obs_2008_2 obs_2010_1 obs_2010_2
#1   1       1152       3110       4829       7051
#2   2       2673       1268       9190       4823
#3   3       3727       1757       4061       4350
#4   4       4392       9966       2007       4406
#5   5       1342       2611       8453       2626
#6   6       7423       3916       2670       5556
#7   7       3179       8972       4311       6716
#8   8       6519       3493       2018       7597
#9   9       3524       5538       2008       6405
#10 10       7935       7159       8303       1700

#[[2]]
#   id obs_2010_1 obs_2010_2 obs_2012_1 obs_2012_2
#1   1       4829       7051       8000       9526
#2   2       9190       4823       7778       8099
#3   3       4061       4350       4106       4806
#4   4       2007       4406       2810       7093
#5   5       8453       2626       3695       9002
#6   6       2670       5556       5910       1801
#7   7       4311       6716       7268       3529
#8   8       2018       7597       5813       8401
#9   9       2008       6405       9052       3414
#10 10       8303       1700       2032       1154

#...
#...

Upvotes: 1

Related Questions