Reputation: 11
I want to group a data frame by two columns (department and product line) and output a new data frame that has counts of selected logical values of each department and product line. The structure of the original data is as follows:
product department line date
apple A big 201707
cherry A midlle 201609
potato B midlle 201801
peach C small 201807
pear B big 201807
The date is numeric and the other variables are characters.
I want to add two columns x and y, with x indicating the date is in year 2018, and y indicating the date is 201807. Grouped by department and line, and in descending order. And the output data frame would be like:
department line x y
A big 0 0
A middle 0 0
B big 1 1
B midlle 1 0
C small 1 1
I tried with dplyr. Firstly, I subset the original data to keep only department, line and date columns. Then I set department and line to factors using factor(). When I use str(subdata), I can see department and line is in factor class.
At last, I use group_by, and summarise to get the data frame I want. But the outcome is not what I want.
DF <- subdata %>%
group_by(department, line) %>%
summarise(x = sum(data$date >= 201800, na.rm = TRUE),
y = sum(data$date == 201807, na.rm = TRUE))
Am I doing anything wrong? I've tried with reshape2 package too, but I couldn't get what I want either. I’ve 2936 rows in my data. All I get is like this:
str(DF)
classes ‘grouped_df’, ‘tb_df’, ‘tb1’ and ‘data.frame’: 1 obs. of 4 variables:
$ department : chr department
$ line : chr line
$ x : int 220
$ y : int 29
I think maybe the problem lies in the factor process of the department and line variables. Since the class after the group_by and summarise process is “character” inspite of “factor”. But I can’t figure out the solution.
Can anybody help?
Upvotes: 1
Views: 2991
Reputation: 2299
Here's a different approach using grepl
:
library(tidyverse)
result <- data %>%
group_by(department, line) %>%
summarise(x = as.numeric(grepl("2018", date)),
y = as.numeric(grepl("201807", date)))
result
## A tibble: 5 x 4
## Groups: department [?]
# department line x y
# <fct> <fct> <dbl> <dbl>
#1 A big 0 0
#2 A midlle 0 0
#3 B big 1 1
#4 B midlle 1 0
#5 C small 1 1
data <- read.table(header = TRUE, text = "
product department line date
apple A big 201707
cherry A midlle 201609
potato B midlle 201801
peach C small 201807
pear B big 201807")
Upvotes: 0
Reputation: 2717
Try this one:
library(tidyverse)
df<-data.frame(product=as.character(c("apple","cherry","potato","peach","pear")),
department=as.character(c("A","A","B","C","B")),
line=c("big","midlle","midlle","small","big"),
date=as.character(c("201707","201609","201801","201807","201807")))
df%>%
mutate(yr= as.numeric(str_sub(date,1,4)),
x=ifelse(yr==2018,1,0),
y=ifelse(date=="201807",1,0))%>%
group_by(department,line)%>%
summarise(x=sum(x,na.rm = T),
y=sum(y,na.rm = T))
# A tibble: 5 x 4
# Groups: department [?]
department line x y
<fct> <fct> <dbl> <dbl>
1 A big 0 0
2 A midlle 0 0
3 B big 1 1
4 B midlle 1 0
5 C small 1 1
Upvotes: 0
Reputation: 1376
I'd suggest using ifelse
on the original dataframe beforehand to create columns x and y as such:
df$x <- ifelse(df$date > 201800, 1, 0)
df$y <- ifelse(df$date == 201807, 1, 0)
Now use dplyr to summarise
library(dplyr)
df_new <- df %>% group_by(department, line) %>% summarise(X = sum(x), Y = sum(y))
Upvotes: 0