fika_fika
fika_fika

Reputation: 161

Summing R Matrix ignoring NA's

I have the following claim counts data (triangular) by limits:

claims=matrix(c(2019,690,712,NA,773,574,NA,NA,232),nrow=3, byrow=T) 

What would be the most elegant way to do the following simple things resembling Excel's sumif():

  1. put the matrix into as.data.frame() with column names: "100k", "250k", "500k"
  2. sum all numbers except first row; (in this case summing 773,574, and 232). I am looking for a neat reference so I can easily generalize the notation to larger claim triangles.

Sum all numbers, ignoring the NA's. sum(claims, na.rm = T) - Thanks for Gregor's suggestion. *I played around with the package ChainLadder a bit and enjoyed how it handles triangular data, especially in plotting and calculating link ratios. I wonder more generally if basic R suffices in doing some quick and dirty sumif() or pairwise link ratio kind of calculations? This would be a bonus for me if anyone out there could dispense some words of wisdom.

Thank you!

Upvotes: 0

Views: 2397

Answers (1)

Gregor Thomas
Gregor Thomas

Reputation: 145965

claims=matrix(c(2019,690,712,NA,773,574,NA,NA,232),nrow=3, byrow=T) 
claims.df = as.data.frame(claims)
names(claims.df) <- c("100k", "250k", "500k")
# This isn't the best idea because standard column names don't start with numbers
# If you go non-standard, you'll have to always quote them, that is
claims.df$100k   # doesn't work
claims.df$`100k` # works    

# sum everything
sum(claims, na.rm = T)

# sum everything except for first row
sum(claims[-1, ], na.rm = T)

It's much easier to give specific advice to specific questions than general advice. As to " I wonder more generally if basic R suffices in doing some quick and dirty sumif() or pairwise link ratio kind of calculations?", at least as to the sumif comment, I'm reminded of fortunes::fortune(286)

...this is kind of like asking "will your Land Rover make it up my driveway?", but I'll assume the question was asked in all seriousness.

sum adds up whatever numbers you give it. Subsetting based on logicals so simple that there is no need for a separate sumif function. Say you have x = rnorm(100), y = runif(100).

# sum x if x > 0
sum(x[x > 0])

# sum x if y < 0.5
sum(x[y < 0.5])

# sum x if x > 0 and y < 0.5
sum(x[x > 0 & y < 0.5])

# sum every other x
sum(x[c(T, F)]

# sum all but the first 10 and last 10 x
sum(x[-c(1:10, 91:100)]

I don't know what a pairwise link ratio is, but I'm willing to bet base R can handle it easily.

Upvotes: 1

Related Questions