Reputation: 341
I have something like:
ISBN Date Quantity
3457 2004 10
3457 2004 6
3457 2004 10
3457 2005 7
3457 2005 12
9885 2013 10
9885 2013 6
9855 2013 10
9885 2014 7
9885 2014 12
And I want to get:
ISBN Date Quantity Year
3457 2004 10 1st Year
3457 2004 6 1st Year
3457 2004 10 1st Year
3457 2005 7 2nd Year
3457 2005 12 2nd Year
9885 2013 10 1st Year
9885 2013 6 1st Year
9855 2013 10 1st Year
9885 2014 7 2nd Year
9885 2014 12 2nd Year
And I have this code:
df<-df %>% group_by(ISBN) %>% mutate(Year = ifelse(DateYear > DateYear,"1st Year","2nd Year"))
But I only get "2nd Year" everywhere, so I guess the comparison in ifelse
is not actually comparing the rows in the column "Date". I guess I have to use a for loop, but was thinking that it's some other way in R. How couldI get to the result I need?
Upvotes: 1
Views: 2907
Reputation: 4871
Just for completeness, and because I personally prefer such solutions, here is one using only base R, relying on split
and lapply
to achieve the result. Effectively, it loops over the different values of ISBN.
# examples data (note possible error on line 8, ISBN==9855)
dat0 <- read.table(text="ISBN Date Quantity
3457 2004 10
3457 2004 6
3457 2004 10
3457 2005 7
3457 2005 12
9885 2013 10
9885 2013 6
9855 2013 10
9885 2014 7
9885 2014 12", header=T)
# treat separately (loop using 'lapply')
datlist <- split(dat,dat$ISBN)
datlist <- lapply(datlist,
function(x) within(x, Year <- as.numeric(as.factor(Date))))
# bind together
dat <- do.call(rbind, datlist)
rownames(dat) <- NULL
Output:
# ISBN Date Quantity Year
# 1 3457 2004 10 1
# 2 3457 2004 6 1
# 3 3457 2004 10 1
# 4 3457 2005 7 2
# 5 3457 2005 12 2
# 6 9855 2013 10 1
# 7 9885 2013 10 1
# 8 9885 2013 6 1
# 9 9885 2014 7 2
# 10 9885 2014 12 2
Note that this method rearranges the data set in such a way that the rows are ordered according to ISBN. In addition, I didn't bother coding the Year
column with 1st Year, 2nd Year, ...
etc. because I didn't really see a value beyond having a much simpler format like 1, 2, ...
.
Upvotes: 0
Reputation: 21621
As per mentioned in the comments, should you have more cases, you could do:
library(dplyr)
library(toOrdinal)
df %>%
group_by(ISBN) %>%
mutate(Year = paste(sapply(cumsum(Date != lag(Date, default = 0)), toOrdinal), "Year"))
For exemple:
# ISBN Date Quantity
#1 3457 2004 10
#2 3457 2004 6
#3 3457 2005 10
#4 3457 2006 7
#5 3457 2007 12
#6 9885 2013 10
#7 9885 2014 6
#8 9855 2015 10
#9 9885 2015 7
#10 9885 2016 12
Gives:
#Source: local data frame [10 x 4]
#Groups: ISBN [3]
#
# ISBN Date Quantity Year
# <int> <int> <int> <chr>
#1 3457 2004 10 1st Year
#2 3457 2004 6 1st Year
#3 3457 2005 10 2nd Year
#4 3457 2006 7 3rd Year
#5 3457 2007 12 4th Year
#6 9885 2013 10 1st Year
#7 9885 2014 6 2nd Year
#8 9855 2015 10 1st Year
#9 9885 2015 7 3rd Year
#10 9885 2016 12 4th Year
Upvotes: 1
Reputation: 10954
Use windowing logic:
library(dplyr)
library(readr)
df_foo = read.table(textConnection("ISBN Date Quantity
3457 2004 10
3457 2004 6
3457 2004 10
3457 2005 7
3457 2005 12
9885 2013 10
9885 2013 6
9855 2013 10
9885 2014 7
9885 2014 12"), header = TRUE, stringsAsFactors = FALSE)
df_foo %>%
group_by(ISBN) %>%
arrange(Date) %>%
mutate(
ifelse(
cumsum(Date != lag(Date, default = first(Date))),
"2nd Year", "1st Year"
)
)
Upvotes: 0