stuck_in_the_middle
stuck_in_the_middle

Reputation: 27

Generate column name using name of objects in list with apply in R

Hope this question can help other R users too.

There is a list with several objects (initial data frames), all with the same structure. The goal is to melt each object and replace the name of the third variable with the name of the corresponding object, using lapply.

The data frames are:

gdp <- data.frame(date = as.Date(c('2010-03-31','2010-06-30','2010-09-30','2010-12-31')),
       id1 = rnorm(4), id2 = rnorm(4), id3 = rnorm(4));
employ <- data.frame(date = as.Date(c('2010-03-31','2010-06-30','2010-09-30','2010-12-31')),
       id1 = rnorm(4), id2 = rnorm(4), id3 = rnorm(4));
fdi <- data.frame(date = as.Date(c('2010-03-31','2010-06-30','2010-09-30','2010-12-31')),
       id1 = rnorm(4), id2 = rnorm(4), id3 = rnorm(4));

The list including the data frames is:

data.list <- list(gdp=gdp, employ=employ, fdi=fdi);

The attempt towards melting the different objects in the list into a panel data structure (melt using id=c("date")), and replacing the name of the third variable (named value, after melting) with the name of the respective object (that is, gdp, employ and fdi), is as follows:

data.list <- lapply(data.list, function(x) {
    x <- melt(x, id = c("date"));
    setnames(x, c("date", "id", paste(names(data.list[x])))); x});

However, this results in the following error message:

"Error in data.list[x] : invalid subscript type 'list'"

Thanks for sharing your knowledge!

Upvotes: 1

Views: 77

Answers (3)

akrun
akrun

Reputation: 887881

We could use hadleyverse syntax with purrr and tidyr package

library(purrr)
library(tidyr)
library(data.table)
data.list %>% 
    map(~gather(., id, value, id1:id3)) %>% #convert to long format
    map2(names(data.list), ~setnames(.x, 'value', .y)) #change the column names
# $gdp
#         date      id        gdp
#1  2010-03-31      id1 -0.7772369
#2  2010-06-30      id1 -0.8056224
#3  2010-09-30      id1  0.8542292
#4  2010-12-31      id1 -1.1872451
#5  2010-03-31      id2  0.8328595
#6  2010-06-30      id2 -0.2474831
#7  2010-09-30      id2 -0.9848888
#8  2010-12-31      id2 -1.3365007
#9  2010-03-31      id3 -0.8461187
#10 2010-06-30      id3  0.3711446
#11 2010-09-30      id3 -1.1862064
#12 2010-12-31      id3  1.1424022

#$employ
#         date      id     employ
#1  2010-03-31      id1  2.7989326
#2  2010-06-30      id1 -1.2110057
#3  2010-09-30      id1 -0.7821650
#4  2010-12-31      id1 -0.3791048
#5  2010-03-31      id2  0.1013004
#6  2010-06-30      id2  1.3332404
#7  2010-09-30      id2 -1.3893301
#8  2010-12-31      id2 -0.8440842
#9  2010-03-31      id3 -0.1077106
#10 2010-06-30      id3 -0.7705078
#11 2010-09-30      id3  1.4519592
#12 2010-12-31      id3 -0.8737978

#$fdi
#         date      id         fdi
#1  2010-03-31      id1  1.23107035
#2  2010-06-30      id1 -0.26811221
#3  2010-09-30      id1  0.33061470
#4  2010-12-31      id1 -0.32557342
#5  2010-03-31      id2 -0.30207594
#6  2010-06-30      id2 -0.41945723
#7  2010-09-30      id2 -0.20942161
#8  2010-12-31      id2 -0.79545903
#9  2010-03-31      id3 -0.01117631
#10 2010-06-30      id3  0.99176069
#11 2010-09-30      id3  0.22381746
#12 2010-12-31      id3 -0.25679217

NOTE: This could also be done with a single map2 code, but I think it is easier to understand the process with two steps.

Upvotes: 0

akuiper
akuiper

Reputation: 215117

Map is better suited here, where you can pass data and names at the same time and set the value name correspondingly, and when you use lapply, you can not access the name of the element you passed to the function as well as the index of the element so it won't be a good fit:

Map(function(data, name) melt(data, id = "date", value.name = name), data.list, names(data.list))

# $gdp
#          date variable         gdp
# 1  2010-03-31      id1 -0.98490642
# 2  2010-06-30      id1 -0.65785037
# 3  2010-09-30      id1  1.84931510
# 4  2010-12-31      id1 -0.01380012
# 5  2010-03-31      id2 -1.07489986
# 6  2010-06-30      id2 -0.53073153
# 7  2010-09-30      id2 -0.41319361
# 8  2010-12-31      id2  0.07883559
# 9  2010-03-31      id3 -0.32027747
# 10 2010-06-30      id3  2.44528354
# 11 2010-09-30      id3  0.77611010
# 12 2010-12-31      id3 -0.20826479
# 
# $employ
#          date variable     employ
# 1  2010-03-31      id1 -0.8094097
# 2  2010-06-30      id1  0.1384562
# 3  2010-09-30      id1  0.5859650
# 4  2010-12-31      id1 -0.5393965
# 5  2010-03-31      id2 -1.0970997
# 6  2010-06-30      id2  1.0017547
# 7  2010-09-30      id2 -0.6750567
# 8  2010-12-31      id2 -0.2550456
# 9  2010-03-31      id3  0.8593821
# 10 2010-06-30      id3 -0.1797962
# 11 2010-09-30      id3 -0.9969474
# 12 2010-12-31      id3  1.9796193
# 
# $fdi
#          date variable          fdi
# 1  2010-03-31      id1 -0.003560763
# 2  2010-06-30      id1 -1.034493176
# 3  2010-09-30      id1 -0.382924576
# 4  2010-12-31      id1 -1.634971043
# 5  2010-03-31      id2  1.069739934
# 6  2010-06-30      id2 -0.953591914
# 7  2010-09-30      id2  0.980699511
# 8  2010-12-31      id2 -1.939297092
# 9  2010-03-31      id3  0.224597714
# 10 2010-06-30      id3 -0.199469601
# 11 2010-09-30      id3  0.710024455
# 12 2010-12-31      id3 -1.716196075

Upvotes: 3

Rentrop
Rentrop

Reputation: 21507

I do understand the question a little different then Psidom: You can use melt directly on the list. by using reshape2::melt.list

melt(data.list, id=c("date"))

Which results in:

         date variable       value     L1
1  2010-03-31      id1  1.25281857    gdp
2  2010-06-30      id1 -0.48590454    gdp
3  2010-09-30      id1 -0.76352141    gdp
4  2010-12-31      id1 -0.74724889    gdp
5  2010-03-31      id2 -1.18055685    gdp
6  2010-06-30      id2 -0.28217948    gdp
7  2010-09-30      id2  0.69016828    gdp
8  2010-12-31      id2 -0.55827152    gdp
9  2010-03-31      id3  0.30202935    gdp
10 2010-06-30      id3  0.74974718    gdp
11 2010-09-30      id3 -0.57454843    gdp
12 2010-12-31      id3  0.24156810    gdp
13 2010-03-31      gdp  1.00000000    gdp
14 2010-06-30      gdp  1.00000000    gdp
15 2010-09-30      gdp  1.00000000    gdp
16 2010-12-31      gdp  1.00000000    gdp
17 2010-03-31      id1 -0.23128530 employ
18 2010-06-30      id1 -0.15230297 employ
19 2010-09-30      id1 -0.36702926 employ
20 2010-12-31      id1  0.73848140 employ
21 2010-03-31      id2  0.95324433 employ
22 2010-06-30      id2 -0.64710459 employ
23 2010-09-30      id2 -1.29508378 employ
24 2010-12-31      id2  1.40630293 employ
25 2010-03-31      id3 -2.25220973 employ
26 2010-06-30      id3  0.23300536 employ
27 2010-09-30      id3 -0.25745376 employ
28 2010-12-31      id3  0.81838150 employ
29 2010-03-31      id1  0.24334109    fdi
30 2010-06-30      id1 -1.06549136    fdi
31 2010-09-30      id1 -0.03566445    fdi
32 2010-12-31      id1  0.37610557    fdi
33 2010-03-31      id2 -1.11626811    fdi
34 2010-06-30      id2 -0.59906541    fdi
35 2010-09-30      id2 -0.34006607    fdi
36 2010-12-31      id2  1.02040731    fdi
37 2010-03-31      id3  0.65030238    fdi
38 2010-06-30      id3 -0.09420529    fdi
39 2010-09-30      id3 -0.34264768    fdi
40 2010-12-31      id3  0.89456456    fdi

Upvotes: 3

Related Questions