Reputation: 831
I'm using dplyr to calculate the grouping of a variable [mode], for a series of files. The goal is to build a data frame where you copy (cbind) the results for every file (any ideas on a better data structure welcome).
So files are like this:
FILE 1:
id |mode | no.cases
1 1 3
2 2 1
3 5 2
FILE 'n':
id |mode | no.cases
1 1 11
2 3 7
3 4 1
Some files may have entries for all 5 categories, others only for 2 or 3 of them. I can obtain all the existing categories from a complete file using:
categories<-factor(mode)
Still, the problem appears when using group_by:
result <-group_by(file,mode,sum(no.cases))
result (for file 1):
mode | no.cases
1 3
2 1
5 2
since you can't cbind on different no. of rows: in some cases the resulting data frame will have 5 rows, in some only 3 rows.
How can you use group_by so that it shows, for each file, all the existing categories, and the ones absent showing value 0?
E.g.
result (for file 1):
mode | no.cases
1 3
2 1
3 0
4 0
5 2
Thanks in advance, p.
Upvotes: 0
Views: 72
Reputation: 2095
Assuming that you have put all categories from a complete file into a data frame called categories.df
categories <- c(1,2,3,4,5)
# create data frame
categories.df <- data.frame(categories)
# rename column name
colnames(categories.df)[colnames(categories.df)=="categories"] <- "mode"
> categories.df
mode
1
2
3
4
5
Below is the sample code to merge categories.df with your file1
# Method 1: merge function
df <- merge(x = categories.df, y = file1, by = "mode", all = TRUE)
df$id <- NULL # remove id
df[is.na(df)] <- 0 # replace na with zero
# Method 2: using dplyr
library(dplyr)
df <- left_join(categories.df, file1)
df$id <- NULL # remove id
df[is.na(df)] <- 0 # replace na with zero
# Method 2: using plyr
library(plyr)
df <- join(categories.df, file1, type="left")
df$id <- NULL # remove id
df[is.na(df)] <- 0 # replace na with zero
> df
mode no.cases
1 3
2 1
3 0
4 0
5 2
Hope this helps.
Upvotes: 1