Reputation: 505
I have a dataframe that ı want to create 2 new variable with that group by Id
.
first I need to group by Id
and get most recent date by createdDate
then I need to get Lead_DataSource__c
again based on most recent date
Here is tail of my dataframe;
tail(df)
Id CreatedDate Lead_DataSource__c StageName
0011000001XW3YZAA1 2020-07-17 Walk in Quotation
0011000001XW3Z8AAL 2020-07-17 Walk in Quotation
0011000001XW3zHAAT 2020-07-17 Walk in Assigned
0011000001XW3zlAAD 2020-07-17 Walk in Quotation
0011000001XW3zvAAD 2020-07-17 Walk in Closed Lost
0011000001XW3zvAAD 2020-07-17 Website Closed Lost
Here is my code for this:
df_new<-df %>% group_by(Id)%>%
mutate(numberoflead=length(Id)) %>% #number of lead
mutate(lastcreateddateoflead=max(CreatedDate)) %>%#last date of lead
mutate(lasttouch =max(CreatedDate)[Lead_DataSource__c])%>% #last touch
when I run these codes I didn't get any error it seems work for numberofleads
and lastcreateddateoflead
but it doesn't seem to work for getting lasttouch
Could anyone help me about what I m missing here?
Upvotes: 0
Views: 337
Reputation: 544
Hey I think I get what you are trying to do but it may not be very properly stated. I believe you want the max date per Id
and then max date per Lead_DataSource__c
, if that is what you want to do maybe try:
df_new <- df %>% group_by(Id) %>%
mutate(numberoflead=length(Id)) %>% #number of lead
mutate(lastcreateddateoflead=max(CreatedDate)) %>%
group_by(Lead_DataSource__c) %>%
mutate(lasttouch =max(CreatedDate)) %>%
ungroup()
Let me know if that's what you tried to accomplish!
Upvotes: 1
Reputation: 5747
Your problem is that you are using mutate
when you should be using summarize
. You need to then join the original df
to get lasttouch
. If you add a select
inside the join, you get just get the lasttouch
column without renaming or selecting anything .
library(dplyr)
df %>%
group_by(Id) %>%
summarize(numberoflead = n(),
lastcreateddateoflead=max(CreatedDate)) %>%
inner_join(df %>%
select(Id, CreatedDate, lasttouch = Lead_DataSource__c),
by = c("Id" = "Id", "lastcreateddateoflead" = "CreatedDate"))
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 6 x 4
Id numberoflead lastcreateddateoflead lasttouch
<chr> <int> <date> <chr>
1 0011000001XW3YZAA1 1 2020-07-17 Walk in
2 0011000001XW3Z8AAL 1 2020-07-17 Walk in
3 0011000001XW3zHAAT 1 2020-07-17 Walk in
4 0011000001XW3zlAAD 1 2020-07-17 Walk in
5 0011000001XW3zvAAD 2 2020-07-17 Walk in
6 0011000001XW3zvAAD 2 2020-07-17 Website
If you want to keep all rows (and not just one summary per Id), then use your mutate instead of my summarize.
df %>%
group_by(Id) %>%
mutate(numberoflead = n(),
lastcreateddateoflead=max(CreatedDate)) %>%
inner_join(df %>%
select(Id, CreatedDate, lasttouch = Lead_DataSource__c),
by = c("Id" = "Id", "lastcreateddateoflead" = "CreatedDate"))
# A tibble: 8 x 7
# Groups: Id [5]
Id CreatedDate Lead_DataSource_~ StageName numberoflead lastcreateddateofl~ lasttouch
<chr> <date> <chr> <chr> <int> <date> <chr>
1 0011000001XW3~ 2020-07-17 Walk in Quotation 1 2020-07-17 Walk in
2 0011000001XW3~ 2020-07-17 Walk in Quotation 1 2020-07-17 Walk in
3 0011000001XW3~ 2020-07-17 Walk in Assigned 1 2020-07-17 Walk in
4 0011000001XW3~ 2020-07-17 Walk in Quotation 1 2020-07-17 Walk in
5 0011000001XW3~ 2020-07-17 Walk in Closed Lo~ 2 2020-07-17 Walk in
6 0011000001XW3~ 2020-07-17 Walk in Closed Lo~ 2 2020-07-17 Website
7 0011000001XW3~ 2020-07-17 Website Closed Lo~ 2 2020-07-17 Walk in
8 0011000001XW3~ 2020-07-17 Website Closed Lo~ 2 2020-07-17 Website
Upvotes: 1