A.B.
A.B.

Reputation: 83

Adding values in data frame based on condition using Loops or IF function - in R code

I have a data in which monthly cash balances are given per account number. However, the number of records or number of months for which data is given for each account id is different say some have 12 months data , some have 24 months data and so on. I have to put all the data in ARIMA model and predict balance for next month. I have notice that ARIMA model doesnt work for uneven periods or it produces unusual results.

`Account_id  Month  $ balance
A            201901 100
A            201902 120
A            201903 135
B            201903 20
C            201902 1700
C            201903 1400

` I tried to add the rows for missing months and put zero values against the balances by amending the dataset in excel resulting in all accounts having same number of records and months.

I want to do this manual step through R code. I believe this should be some loop / IF function or Rbind/cbind something but am not that fluent on the code. Please help!

Based on suggested soluton, i tried this:

It is producing 54 lines for each month each id and all balances showing as 0

months <- as.character(seq(as.Date('2015-01-
01'),as.Date('2019-06-01'), by = "1 month"))

accounts <- df$account_id

shell <- expand.grid(Account_id = accounts, Month = months, stringsAsFactors 
= F)

data <- data.frame(Account_id = df$account_id, Month = 
df$partition_ledger_year_month, balance = df$amount_usd,stringsAsFactors = F)

df2 <- merge(shell, data, by=c('Account_id','Month'), all.x = T)

df2[which(is.na(df2$balance)),]$balance <- 0

Expected output:

`Account_id  Month  $ balance
A            201901 100
A            201902 120
A            201903 135
B            201901 0
B            201902 0
B            201903 20
C            201901 0
C            201902 1700
C            201903 1400

All values are in my data frame columns, only i have to missing moth with '0' balance. The complete data for any account id is 54 months.

Upvotes: 0

Views: 230

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 389055

You can use tidyr::complete and fill the missing values of balance with 0.

df1 <- tidyr::complete(df, Account_id, Month, fill = list(balance = 0))
df1
# A tibble: 9 x 3
#  Account_id  Month balance
#  <chr>       <int>   <dbl>
#1 A          201901     100
#2 A          201902     120
#3 A          201903     135
#4 B          201901       0
#5 B          201902       0
#6 B          201903      20
#7 C          201901       0
#8 C          201902    1700
#9 C          201903    1400

data

df <- structure(list(Account_id = c("A", "A", "A", "B", "C", "C"), 
    Month = c(201901L, 201902L, 201903L, 201903L, 201902L, 201903L
    ), balance = c(100L, 120L, 135L, 20L, 1700L, 1400L)), 
    class = "data.frame", row.names = c(NA, -6L))

Upvotes: 0

SmokeyShakers
SmokeyShakers

Reputation: 3402

How about making a shell data.frame with zero balances then filling in the balances you have:

# All Possible Months
months <- as.character(seq(as.Date('2019-01-01'),as.Date('2020-01-01'), by = "1 month"))

# All Possible account ids
accounts <- LETTERS

# A shell
shell <- expand.grid(Account_id = accounts, Month = months, stringsAsFactors = F)

# Your data
data <- data.frame(Account_id = c('A','B','A'), Month = c('2019-02-01', '2019-03-01','2019-01-01'), balance = c(100,200,300),stringsAsFactors = F)

# Left Join to the shell
df <- merge(shell, data, by=c('Account_id','Month'), all.x = T)

# Fill in missing balances
df[which(is.na(df$balance)),]$balance <- 0


df

Upvotes: 1

Related Questions