How can I impute observations on one variable in a list of dataframes? (dyadic time series)

Question

I have several individual csv-files on specific country pairs and their trade volumes for the years 1870-2020 (using the COW trade dataset, smoothtotrade variable here). Unfortunately, the dataset is only available until 2014, so all other values are set NA.

After trying several things to impute/forecast the missing data, I've decided it might be best to just carry forward the last available value (i.e., smoothtotrade in 2014). However, I can't get it to work. I've been using the imputeTS package here, using the na_locf function. Can someone help me out?

The list of data frames is called data_frames. My current code:

library(imputeTS)

*Imputation function using carry forward of the average of the last three non-missing values*

impute_smoothtotrade <- function(ts_data) {

  ts_data_imputed <- na.locf(ts_data, option = "locf")
  
  return(ts_data_imputed)
}

*Loop through each data frame (time series) in the list*

for (i in seq_along(data_frames)) {
  
  data_frames[[i]]$smoothtotrade <- impute_smoothtotrade(data_frames[[i]]$smoothtotrade)
}

this is the result of a random country pair, showing clearly that the 2014 value was evidently not carried forward as intended.

51    AUT    CMR 2010     11.484859  
52    AUT    CMR 2011     10.393110  
53    AUT    CMR 2012      6.902980  
54    AUT    CMR 2013      4.058900  
55    AUT    CMR 2014      9.018300  
89    AUT    CMR 2015      2.582298  
90    AUT    CMR 2016      2.582298  
91    AUT    CMR 2017      2.582298  
92    AUT    CMR 2018      2.582298  
93    AUT    CMR 2019      2.582298  
94    AUT    CMR 2020      2.582298

How can I impute observations on one variable in a list of dataframes? (dyadic time series)

Answers (1)

Related Questions