Reputation: 83
I have a data in which monthly cash balances are given per account number. However, the number of records or number of months for which data is given for each account id is different say some have 12 months data , some have 24 months data and so on. I have to put all the data in ARIMA model and predict balance for next month. I have notice that ARIMA model doesnt work for uneven periods or it produces unusual results.
`Account_id Month $ balance
A 201901 100
A 201902 120
A 201903 135
B 201903 20
C 201902 1700
C 201903 1400
` I tried to add the rows for missing months and put zero values against the balances by amending the dataset in excel resulting in all accounts having same number of records and months.
I want to do this manual step through R code. I believe this should be some loop / IF function or Rbind/cbind something but am not that fluent on the code. Please help!
Based on suggested soluton, i tried this:
It is producing 54 lines for each month each id and all balances showing as 0
months <- as.character(seq(as.Date('2015-01-
01'),as.Date('2019-06-01'), by = "1 month"))
accounts <- df$account_id
shell <- expand.grid(Account_id = accounts, Month = months, stringsAsFactors
= F)
data <- data.frame(Account_id = df$account_id, Month =
df$partition_ledger_year_month, balance = df$amount_usd,stringsAsFactors = F)
df2 <- merge(shell, data, by=c('Account_id','Month'), all.x = T)
df2[which(is.na(df2$balance)),]$balance <- 0
Expected output:
`Account_id Month $ balance
A 201901 100
A 201902 120
A 201903 135
B 201901 0
B 201902 0
B 201903 20
C 201901 0
C 201902 1700
C 201903 1400
All values are in my data frame columns, only i have to missing moth with '0' balance. The complete data for any account id is 54 months.
Upvotes: 0
Views: 230
Reputation: 389055
You can use tidyr::complete
and fill the missing values of balance
with 0.
df1 <- tidyr::complete(df, Account_id, Month, fill = list(balance = 0))
df1
# A tibble: 9 x 3
# Account_id Month balance
# <chr> <int> <dbl>
#1 A 201901 100
#2 A 201902 120
#3 A 201903 135
#4 B 201901 0
#5 B 201902 0
#6 B 201903 20
#7 C 201901 0
#8 C 201902 1700
#9 C 201903 1400
data
df <- structure(list(Account_id = c("A", "A", "A", "B", "C", "C"),
Month = c(201901L, 201902L, 201903L, 201903L, 201902L, 201903L
), balance = c(100L, 120L, 135L, 20L, 1700L, 1400L)),
class = "data.frame", row.names = c(NA, -6L))
Upvotes: 0
Reputation: 3402
How about making a shell data.frame with zero balances then filling in the balances you have:
# All Possible Months
months <- as.character(seq(as.Date('2019-01-01'),as.Date('2020-01-01'), by = "1 month"))
# All Possible account ids
accounts <- LETTERS
# A shell
shell <- expand.grid(Account_id = accounts, Month = months, stringsAsFactors = F)
# Your data
data <- data.frame(Account_id = c('A','B','A'), Month = c('2019-02-01', '2019-03-01','2019-01-01'), balance = c(100,200,300),stringsAsFactors = F)
# Left Join to the shell
df <- merge(shell, data, by=c('Account_id','Month'), all.x = T)
# Fill in missing balances
df[which(is.na(df$balance)),]$balance <- 0
df
Upvotes: 1