Extract and combine columns with the same name from a list of dataframes

Question

I got a list of 17 dataframes that contain multiple macroeconomic variables for several countries, and the dataframes' structure is like:

df$CPI
 Date       US    Argentina  Vietnam  India  Indonesia  Philippines 
1564531200  1.8      54.4     2.4      3.1       3.3        2.4      
1561852800  1.6      55.8     2.2      3.2       3.3        2.7
1559260800  1.8      57.3     2.9      3.0       3.3        3.2


df$CapitalAccount
 Date          US    Argentina  Brazil  China  Turkey   Thai 
2019-06-30     0        13.8     49.0   -58.5   -7.2    27.9
2019-03-31     0        32.2     98.1   -26.3   21.4    0.0 
2018-12-31    2721      16.2     59.8   -213.1  0.5     0.0 
2018-06-30    -5        10.9     82.0   -50.6   -2.7    0.0

I'm trying to re-organize those dataframes by country names, like:

US
Date         CPI   CapitalAccount .......(the other 14 macro variables)
2019-06-30  
2019-03-31
2018-12-31

Argentina
Date         CPI   CapitalAccount .......(the other 14 macro variables)
2019-06-30  
2019-03-31
2018-12-31
.
.
.
.

I've tried using a for loop to go through each dataframe in the list of dataframes and grab the column by colnames() of that dataframe, but it's not working and the result gives me many duplicate NAs and Dates.

For US:

for (i in 1:length(df)){
  NewUS <- df[[i]][,which(colnames(df[[i]])=='US')]
  US <- merge(US, NewUS)
  i <- i+1
  }
US


For Argentina:
for (i in 1:length(df)){
  NewArgentina <- df[[i]][,which(colnames(df[[i]])=='Argentina')]
  Argentina <- merge(Argentina, NewArgentina)
  i <- i+1
  }
Argentina

Parfait · Accepted Answer

Consider base R with reshape, chain merge, and split for transformed named list of data frames. Helper functions include higher-order Map and Reduce.

proc_reshape <- function(df, nm) {

    within(data.frame(reshape(df, varying = names(df)[-1], times = names(df)[-1],
                              v.names = nm, timevar = "Country", direction = "long"),
                      row.names = NULL), {
           Date <- as.Date(as.POSIXct(Date, origin = "1970-01-01"))
           rm(id)
    })

}

# ELEMENTWISE LOOP THROUGH DFs AND THEIR NAMES
long_list <- Map(proc_reshape, my_list, names(my_list))

# CHAIN MERGE (FULL JOIN FOR MISMATCHED DATES BY COUNTRY)
merged_df <- Reduce(function(x, y) merge(x, y, by = c("Country", "Date"), all = TRUE), 
                    long_list)

# CREATE NEW NAMED LIST OF DFs
new_list <- split(merged_df, merged_df$Country)

Output

new_list  

$Argentina 
   Country         Date  CPI CapitalAccount 
1  Argentina 2018-06-29   NA           10.9 
2  Argentina 2018-12-30   NA           16.2 
3  Argentina 2019-03-30   NA           32.2 
4  Argentina 2019-05-31 57.3             NA 
5  Argentina 2019-06-29   NA           13.8 
6  Argentina 2019-06-30 55.8             NA 
7  Argentina 2019-07-31 54.4             NA 

$Brazil 
    Country      Date CPI CapitalAccount 
8   Brazil 2018-06-29  NA           82.0 
9   Brazil 2018-12-30  NA           59.8 
10  Brazil 2019-03-30  NA           98.1 
11  Brazil 2019-06-29  NA           49.0 

$China 
    Country       Date CPI CapitalAccount 
12  China   2018-06-29  NA          -50.6 
13  China   2018-12-30  NA         -213.1 
14  China   2019-03-30  NA          -26.3 
15  China   2019-06-29  NA          -58.5 

$India 
    Country       Date CPI CapitalAccount 
16  India   2019-05-31 3.0             NA 
17  India   2019-06-30 3.2             NA 
18  India   2019-07-31 3.1             NA 

$Indonesia 
   Country         Date CPI CapitalAccount 
19 Indonesia 2019-05-31 3.3             NA 
20 Indonesia 2019-06-30 3.3             NA 
21 Indonesia 2019-07-31 3.3             NA 

$Philippines 
   Country           Date CPI CapitalAccount 
22 Philippines 2019-05-31 3.2             NA 
23 Philippines 2019-06-30 2.7             NA 
24 Philippines 2019-07-31 2.4             NA 

$Thai 
   Country       Date CPI CapitalAccount 
25 Thai    2018-06-29  NA            0.0 
26 Thai    2018-12-30  NA            0.0 
27 Thai    2019-03-30  NA            0.0 
28 Thai    2019-06-29  NA           27.9 

$Turkey 
   Country       Date CPI CapitalAccount 
29  Turkey 2018-06-29  NA           -2.7 
30  Turkey 2018-12-30  NA            0.5 
31  Turkey 2019-03-30  NA           21.4 
32  Turkey 2019-06-29  NA           -7.2 

$US 
    Country       Date CPI CapitalAccount 
33       US 2018-06-29  NA             -5 
34       US 2018-12-30  NA           2721 
35       US 2019-03-30  NA              0 
36       US 2019-05-31 1.8             NA 
37       US 2019-06-29  NA              0 
38       US 2019-06-30 1.6             NA 
39       US 2019-07-31 1.8             NA 

$Vietnam 
   Country       Date CPI CapitalAccount 
40 Vietnam 2019-05-31 2.9             NA 
41 Vietnam 2019-06-30 2.2             NA 
42 Vietnam 2019-07-31 2.4             NA

Demo

Extract and combine columns with the same name from a list of dataframes

Answers (2)

Related Questions