Reputation: 163
I have a data.frame like the date stored in the following dput()
:
df <- structure(list(id = 1:10, obs_2008_1 = c(1152L, 2673L, 3727L,
4392L, 1342L, 7423L, 3179L, 6519L, 3524L, 7935L), obs_2008_2 = c(3110L,
1268L, 1757L, 9966L, 2611L, 3916L, 8972L, 3493L, 5538L, 7159L
), obs_2009_1 = c(6546L, 4666L, 9772L, 9982L, 6120L, 6909L, 1416L,
1776L, 9186L, 7163L), obs_2009_2 = c(1592L, 4568L, 5582L, 9826L,
7420L, 5845L, 2629L, 6596L, 1436L, 8102L), obs_2010_1 = c(4829L,
9190L, 4061L, 2007L, 8453L, 2670L, 4311L, 2018L, 2008L, 8303L
), obs_2010_2 = c(7051L, 4823L, 4350L, 4406L, 2626L, 5556L, 6716L,
7597L, 6405L, 1700L), obs_2011_1 = c(4401L, 8174L, 9138L, 8296L,
5538L, 2565L, 7245L, 5036L, 8277L, 8421L), obs_2011_2 = c(5546L,
8311L, 9714L, 1428L, 3029L, 9221L, 7116L, 4620L, 4028L, 2223L
), obs_2012_1 = c(8000L, 7778L, 4106L, 2810L, 3695L, 5910L, 7268L,
5813L, 9052L, 2032L), obs_2012_2 = c(9526L, 8099L, 4806L, 7093L,
9002L, 1801L, 3529L, 8401L, 3414L, 1154L), obs_2013_1 = c(5846L,
9976L, 6592L, 3463L, 1996L, 2860L, 9272L, 1617L, 4945L, 6295L
), obs_2013_2 = c(2504L, 8277L, 7397L, 9075L, 5465L, 2886L, 6587L,
7687L, 4397L, 6353L), obs_2014_1 = c(5215L, 3211L, 8115L, 5154L,
1607L, 9160L, 8967L, 4313L, 6589L, 3741L), obs_2014_2 = c(2288L,
5682L, 4552L, 3747L, 8444L, 8711L, 9595L, 6814L, 9314L, 9594L
), obs_2015_1 = c(2288L, 5682L, 4552L, 3747L, 8444L, 8711L, 9595L,
6814L, 9314L, 9594L), obs_2015_2 = c(7558L, 8192L, 5014L, 1002L,
9582L, 2138L, 5739L, 8448L, 7849L, 7607L)), class = "data.frame", row.names = c(NA,
-10L))
It is basically yearly observations for regions 1:10
. In the real data, there are much more than two columns per year. In a next step, I want to analyse development over time. Once for the whole period, and then for two-year steps in between. The tricky part is, that the data.frame is updated every year, so it get's "wider" on a regular basis.
The following code gives me a selection vector b
with the id
and all the years I want to analyse:
a <- colnames(df)
b <- str_remove_all(a[c(1, seq(2, length(a), 4), length(a))], "obs_")
b <- str_remove_all(b, "_[0-9]")
So in this case, b
prints:
[1] "id" "2008" "2010" "2012" "2014" "2015"
In a next step, I want to come up with multiple dataframes or a list with multiple dataframes that contain a selection based on my selection vector:
2008
, all columns matching 2010
2010
, all columns matching 2012
2012
, all columns matching 2014
2014
, all columns matching 2015
2008
, all columns matching 2015
Ideally, the resulting data.frames would also include the range in their names, for example df_2008_2010
, df_2010_2012
,...
With this list, I could do my calculations for the development between every two years and between the full period. I made a few attempts with a loop, but they were not very promising.
Thanks for any help and suggestions!
Upvotes: 0
Views: 42
Reputation: 1277
A general advice in data modeling is to define the structures in such a way that they will not change over time. The number of columns in your data frame should thus be fixed and when you add new data, rows should be added. In data base lingo, this means converting your model into the "first normal form".
With your data, this would mean to define a column year holding the year of the measurement. Thn you can extract all data from one year with
df(df$year=2015,)
Upvotes: 2
Reputation: 389055
You can try this :
b <- b[-1]
cols <- names(df)
lapply(seq_along(b), function(i) {
if(i == length(b)) inds <- b[c(i, 1)] else inds <- b[i:(i + 1)]
cbind(df[1], df[grep(paste0(inds, collapse = '|'), cols)])
}) -> list_df
This returns :
list_df
[[1]]
# id obs_2008_1 obs_2008_2 obs_2010_1 obs_2010_2
#1 1 1152 3110 4829 7051
#2 2 2673 1268 9190 4823
#3 3 3727 1757 4061 4350
#4 4 4392 9966 2007 4406
#5 5 1342 2611 8453 2626
#6 6 7423 3916 2670 5556
#7 7 3179 8972 4311 6716
#8 8 6519 3493 2018 7597
#9 9 3524 5538 2008 6405
#10 10 7935 7159 8303 1700
#[[2]]
# id obs_2010_1 obs_2010_2 obs_2012_1 obs_2012_2
#1 1 4829 7051 8000 9526
#2 2 9190 4823 7778 8099
#3 3 4061 4350 4106 4806
#4 4 2007 4406 2810 7093
#5 5 8453 2626 3695 9002
#6 6 2670 5556 5910 1801
#7 7 4311 6716 7268 3529
#8 8 2018 7597 5813 8401
#9 9 2008 6405 9052 3414
#10 10 8303 1700 2032 1154
#...
#...
Upvotes: 1