Arthur Carvalho Brito
Arthur Carvalho Brito

Reputation: 560

How to aggregate using ddply when not all elements of a variable exist on R

I am having trouble using combinations of ddply and merge to aggregate some variables. The data frame that I am using is really large, so I am putting an example below:

data_sample <- cbind.data.frame(c(123,123,123,321,321,134,145,000),
                               c('j', 'f','j','f','f','o','j','f'),
                               c(seq(110,180, by = 10)))

colnames(data_sample) <- c('Person','Expense_Type','Expense_Value')

I want to calculate, for each person, the percentage of the value of expense of type j on the person's total expense.

data_sample2 <- ddply(data_sample, c('Person'), transform, total = sum(Value))
data_sample2 <- ddply(data_sample2, c('Person','Type'), transform, empresa = sum(Value))

This it what I've done to list total expenses by type, but the problem is that not all individuals have expenses of type j, so their percentage should be 0 and I don't know how to leave only one line per person with the percentage of total expenses of type j.

I might have not made myself clear.

Thank you!

Upvotes: 1

Views: 190

Answers (1)

bouncyball
bouncyball

Reputation: 10771

We can use the by function:

by(data_sample, data_sample$Person, FUN = function(dat){
    sum(dat[dat$Expense_Type == 'j',]$Expense_Value) / sum(dat$Expense_Value)
})

We could also make use of the dplyr package:

library(dplyr)
data_sample %>%
    group_by(Person) %>%
    summarise(Percent_J = sum(ifelse(Expense_Type == 'j', Expense_Value, 0)) / sum(Expense_Value))

# A tibble: 5 × 2
  Person Percent_J
   <dbl>     <dbl>
1      0 0.0000000
2    123 0.6666667
3    134 0.0000000
4    145 1.0000000
5    321 0.0000000

Upvotes: 1

Related Questions