adlisval
adlisval

Reputation: 341

R compare current row to the next row (on same column)

I have something like:

ISBN   Date  Quantity
3457   2004  10
3457   2004  6
3457   2004  10
3457   2005  7
3457   2005  12
9885   2013  10
9885   2013  6
9855   2013  10
9885   2014  7
9885   2014  12

And I want to get:

ISBN   Date  Quantity Year
3457   2004  10       1st Year
3457   2004  6        1st Year
3457   2004  10       1st Year
3457   2005  7        2nd Year
3457   2005  12       2nd Year
9885   2013  10       1st Year
9885   2013  6        1st Year
9855   2013  10       1st Year
9885   2014  7        2nd Year
9885   2014  12       2nd Year

And I have this code:

df<-df %>% group_by(ISBN) %>% mutate(Year = ifelse(DateYear > DateYear,"1st Year","2nd Year"))

But I only get "2nd Year" everywhere, so I guess the comparison in ifelse is not actually comparing the rows in the column "Date". I guess I have to use a for loop, but was thinking that it's some other way in R. How couldI get to the result I need?

Upvotes: 1

Views: 2907

Answers (3)

SimonG
SimonG

Reputation: 4871

Just for completeness, and because I personally prefer such solutions, here is one using only base R, relying on split and lapply to achieve the result. Effectively, it loops over the different values of ISBN.

# examples data (note possible error on line 8, ISBN==9855)
dat0 <- read.table(text="ISBN   Date  Quantity
3457   2004  10
3457   2004  6
3457   2004  10
3457   2005  7
3457   2005  12
9885   2013  10
9885   2013  6
9855   2013  10
9885   2014  7
9885   2014  12", header=T)

# treat separately (loop using 'lapply')
datlist <- split(dat,dat$ISBN)
datlist <- lapply(datlist,
  function(x) within(x, Year <- as.numeric(as.factor(Date))))

# bind together
dat <- do.call(rbind, datlist)
rownames(dat) <- NULL

Output:

#    ISBN Date Quantity Year
# 1  3457 2004       10    1
# 2  3457 2004        6    1
# 3  3457 2004       10    1
# 4  3457 2005        7    2
# 5  3457 2005       12    2
# 6  9855 2013       10    1
# 7  9885 2013       10    1
# 8  9885 2013        6    1
# 9  9885 2014        7    2
# 10 9885 2014       12    2

Note that this method rearranges the data set in such a way that the rows are ordered according to ISBN. In addition, I didn't bother coding the Year column with 1st Year, 2nd Year, ... etc. because I didn't really see a value beyond having a much simpler format like 1, 2, ....

Upvotes: 0

Steven Beaupr&#233;
Steven Beaupr&#233;

Reputation: 21621

As per mentioned in the comments, should you have more cases, you could do:

library(dplyr)
library(toOrdinal)

df %>%
  group_by(ISBN) %>%
  mutate(Year = paste(sapply(cumsum(Date != lag(Date, default = 0)), toOrdinal), "Year"))

For exemple:

#   ISBN Date Quantity
#1  3457 2004       10
#2  3457 2004        6
#3  3457 2005       10
#4  3457 2006        7
#5  3457 2007       12
#6  9885 2013       10
#7  9885 2014        6
#8  9855 2015       10
#9  9885 2015        7
#10 9885 2016       12

Gives:

#Source: local data frame [10 x 4]
#Groups: ISBN [3]
#
#    ISBN  Date Quantity     Year
#   <int> <int>    <int>    <chr>
#1   3457  2004       10 1st Year
#2   3457  2004        6 1st Year
#3   3457  2005       10 2nd Year
#4   3457  2006        7 3rd Year
#5   3457  2007       12 4th Year
#6   9885  2013       10 1st Year
#7   9885  2014        6 2nd Year
#8   9855  2015       10 1st Year
#9   9885  2015        7 3rd Year
#10  9885  2016       12 4th Year

Upvotes: 1

tchakravarty
tchakravarty

Reputation: 10954

Use windowing logic:

library(dplyr)
library(readr)

df_foo = read.table(textConnection("ISBN   Date  Quantity
3457   2004  10
3457   2004  6
3457   2004  10
3457   2005  7
3457   2005  12
9885   2013  10
9885   2013  6
9855   2013  10
9885   2014  7
9885   2014  12"), header = TRUE, stringsAsFactors = FALSE)


df_foo %>% 
  group_by(ISBN) %>% 
  arrange(Date) %>% 
  mutate(
    ifelse(
    cumsum(Date != lag(Date, default = first(Date))),
    "2nd Year", "1st Year"
    )
  )

Upvotes: 0

Related Questions