Reputation: 3
I got a list of 17 dataframes that contain multiple macroeconomic variables for several countries, and the dataframes' structure is like:
df$CPI
Date US Argentina Vietnam India Indonesia Philippines
1564531200 1.8 54.4 2.4 3.1 3.3 2.4
1561852800 1.6 55.8 2.2 3.2 3.3 2.7
1559260800 1.8 57.3 2.9 3.0 3.3 3.2
df$CapitalAccount
Date US Argentina Brazil China Turkey Thai
2019-06-30 0 13.8 49.0 -58.5 -7.2 27.9
2019-03-31 0 32.2 98.1 -26.3 21.4 0.0
2018-12-31 2721 16.2 59.8 -213.1 0.5 0.0
2018-06-30 -5 10.9 82.0 -50.6 -2.7 0.0
I'm trying to re-organize those dataframes by country names, like:
US
Date CPI CapitalAccount .......(the other 14 macro variables)
2019-06-30
2019-03-31
2018-12-31
Argentina
Date CPI CapitalAccount .......(the other 14 macro variables)
2019-06-30
2019-03-31
2018-12-31
.
.
.
.
I've tried using a for loop to go through each dataframe in the list of dataframes and grab the column by colnames() of that dataframe, but it's not working and the result gives me many duplicate NAs and Dates.
For US:
for (i in 1:length(df)){
NewUS <- df[[i]][,which(colnames(df[[i]])=='US')]
US <- merge(US, NewUS)
i <- i+1
}
US
For Argentina:
for (i in 1:length(df)){
NewArgentina <- df[[i]][,which(colnames(df[[i]])=='Argentina')]
Argentina <- merge(Argentina, NewArgentina)
i <- i+1
}
Argentina
Upvotes: 0
Views: 166
Reputation: 107587
Consider base R with reshape
, chain merge
, and split
for transformed named list of data frames. Helper functions include higher-order Map
and Reduce
.
proc_reshape <- function(df, nm) {
within(data.frame(reshape(df, varying = names(df)[-1], times = names(df)[-1],
v.names = nm, timevar = "Country", direction = "long"),
row.names = NULL), {
Date <- as.Date(as.POSIXct(Date, origin = "1970-01-01"))
rm(id)
})
}
# ELEMENTWISE LOOP THROUGH DFs AND THEIR NAMES
long_list <- Map(proc_reshape, my_list, names(my_list))
# CHAIN MERGE (FULL JOIN FOR MISMATCHED DATES BY COUNTRY)
merged_df <- Reduce(function(x, y) merge(x, y, by = c("Country", "Date"), all = TRUE),
long_list)
# CREATE NEW NAMED LIST OF DFs
new_list <- split(merged_df, merged_df$Country)
Output
new_list
$Argentina
Country Date CPI CapitalAccount
1 Argentina 2018-06-29 NA 10.9
2 Argentina 2018-12-30 NA 16.2
3 Argentina 2019-03-30 NA 32.2
4 Argentina 2019-05-31 57.3 NA
5 Argentina 2019-06-29 NA 13.8
6 Argentina 2019-06-30 55.8 NA
7 Argentina 2019-07-31 54.4 NA
$Brazil
Country Date CPI CapitalAccount
8 Brazil 2018-06-29 NA 82.0
9 Brazil 2018-12-30 NA 59.8
10 Brazil 2019-03-30 NA 98.1
11 Brazil 2019-06-29 NA 49.0
$China
Country Date CPI CapitalAccount
12 China 2018-06-29 NA -50.6
13 China 2018-12-30 NA -213.1
14 China 2019-03-30 NA -26.3
15 China 2019-06-29 NA -58.5
$India
Country Date CPI CapitalAccount
16 India 2019-05-31 3.0 NA
17 India 2019-06-30 3.2 NA
18 India 2019-07-31 3.1 NA
$Indonesia
Country Date CPI CapitalAccount
19 Indonesia 2019-05-31 3.3 NA
20 Indonesia 2019-06-30 3.3 NA
21 Indonesia 2019-07-31 3.3 NA
$Philippines
Country Date CPI CapitalAccount
22 Philippines 2019-05-31 3.2 NA
23 Philippines 2019-06-30 2.7 NA
24 Philippines 2019-07-31 2.4 NA
$Thai
Country Date CPI CapitalAccount
25 Thai 2018-06-29 NA 0.0
26 Thai 2018-12-30 NA 0.0
27 Thai 2019-03-30 NA 0.0
28 Thai 2019-06-29 NA 27.9
$Turkey
Country Date CPI CapitalAccount
29 Turkey 2018-06-29 NA -2.7
30 Turkey 2018-12-30 NA 0.5
31 Turkey 2019-03-30 NA 21.4
32 Turkey 2019-06-29 NA -7.2
$US
Country Date CPI CapitalAccount
33 US 2018-06-29 NA -5
34 US 2018-12-30 NA 2721
35 US 2019-03-30 NA 0
36 US 2019-05-31 1.8 NA
37 US 2019-06-29 NA 0
38 US 2019-06-30 1.6 NA
39 US 2019-07-31 1.8 NA
$Vietnam
Country Date CPI CapitalAccount
40 Vietnam 2019-05-31 2.9 NA
41 Vietnam 2019-06-30 2.2 NA
42 Vietnam 2019-07-31 2.4 NA
Upvotes: 0
Reputation: 105
EDIT: per @Gregor's suggestion. I use idcol
and fill = T
to replace the for loop.
Hope this helps. In the code below, df1
and df2
are dummy data tables. In your case, they will be CPI, CapitalAccount...
First, we select the columns from each table, add a new column in each of the data table in the list called type
and assign the economic variables in the column. Next, we use rbindlist()
to bind the list now that your data tables have the exact columns.
library(data.table)
df1 <- data.table(date = rep(seq(from = as.Date('2019-01-01'), to = as.Date('2019-01-05'), by = 'day'), 5),
US = runif(25),
Argentina = runif(25),
Thailand = runif(25),
China = runif(25))
df2 <- data.table(date = rep(seq(from = as.Date('2019-01-01'), to = as.Date('2019-01-05'), by = 'day'), 5),
US = runif(25),
Argentina = runif(25),
Japan = runif(25))
l1 <- list(df1, df2)
names(l1) <- c('GDP', 'CPI')
x <- rbindlist(l1, idcol = 'type', fill = TRUE) # this works even when the columns are different for each table
Now we have all the data tables combine, we can reshape the table to make look like the result you wanted.
x1 <- melt(x, id.vars = c('date', 'type'), measure.vars = c('US', 'Argentina'), variable.name = 'country', value.name = 'value')
dcast(x1, date + country ~ type, value.var = 'value')
Upvotes: 2