cthulhukk
cthulhukk

Reputation: 99

Extracting rows for which a column's value changes from one date to the next in R

Good evening! I have the following data.frame from which I would like to extract rows for which the qualifier changes from one month to the next. I have no idea how to approach this, unfortunately...

Data:

id <- c("A","B","C","D","A","B","C","D","A","B","C","D")
qualifier <- c("G","H","I","I","I","H","Y","I","I","I","I","I")
date <- c("01/31/15","01/31/15",'01/31/15','01/31/15','02/28/15', "02/28/15", "02/28/15","02/28/15","03/31/15","03/31/15","03/31/15","03/31/15")
sample.data <- data.frame(id,qualifier,date)

The "id" is unique for any given date but may or may not exist on other dates. Each "id" has a qualifier "I" or "H". All dates are month-end. I would like to extract all rows for which the id's qualifier changes from either "I" or "G" to "H" or "Y" from one month to the next. So, to specify, only changes from 01/31/15 to 02/28/15 would be relevant, not changes from 01/31/15 to 03/31/15. Given the data.frame above, the change I would be interested in would be row 7, when the qualifer for "id" C changes from "I" (from row 3) to "Y" . Ideally, the extracted row 7 would have the value of the qualifier from row 3 as well.

The result should look something like this:

id   qualifier  date       qualifier.previous.month
C    H          02/28/15   I

Thanks in advance for any help!

Upvotes: 0

Views: 271

Answers (3)

ashetty
ashetty

Reputation: 61

# Change the date to a Date format
sample.data$date <- as.Date(sample.data$date, format = "%m/%d/%y")

# Collect the month value from the date
sample.data$month <- as.numeric(format(sample.data$date, "%m")) 

# Create a copy of the sample dataframe
sample.data_shifted <- sample.data

# Shift the months by one in the copied dataframe
sample.data_shifted$month <- sample.data_shifted$month + 1

# Merge the dataframes, so each ID/month row has the current and previous month qualifier 
sample.data_merged <- merge(
  sample.data_shifted, sample.data, 
  by = c("id", "month"), 
  suffixes = c(".previous.month", ".current.month")
  )

# Tidying up dates
sample.data_merged$date <- sample.data_merged$date.current.month

sample.data_merged <- subset(sample.data_merged, select = -c(date.previous.month, date.current.month))

# We create a logical vector telling us which qualifiers changed between which months
previous_month_ig <- sample.data_merged$qualifier.previous.month %in% c("I", "G")
current_month_hy <- sample.data_merged$qualifier.current.month %in% c("H", "Y")

# Now we only look at IDs/months where the qualifier changed from I/G to H/Y
sample.data_final <- sample.data_merged[previous_month_ig & current_month_hy, ]

sample.data_final

  id month qualifier.previous.month qualifier.current.month       date
5  C     2                        I                       Y 2015-02-28

This should fit your specifications - it's a bit lengthy to make it clearer. The idea here is to merge back onto your dataframe with the months shifted by one. This should work if you only have one date per month.

Upvotes: 2

ThomasIsCoding
ThomasIsCoding

Reputation: 101508

Perhaps a base R option using ave within transform could help

transform(
  sample.data,
  qualifier.previous.month = ave(qualifier, id, FUN = function(x) c(NA, x[-length(x)]))
)

which gives

   id qualifier     date qualifier.previous.month
1   A         G 01/31/15                     <NA>
2   B         H 01/31/15                     <NA>
3   C         I 01/31/15                     <NA>
4   D         I 01/31/15                     <NA>
5   A         I 02/28/15                        G
6   B         H 02/28/15                        H
7   C         Y 02/28/15                        I
8   D         I 02/28/15                        I
9   A         I 03/31/15                        I
10  B         I 03/31/15                        H
11  C         I 03/31/15                        Y
12  D         I 03/31/15                        I

A data.table option using shift can make it as well

> setDT(sample.data)[, qualifier.previous.month := shift(qualifier), id][]
    id qualifier     date qualifier.previous.month
 1:  A         G 01/31/15                     <NA>
 2:  B         H 01/31/15                     <NA>
 3:  C         I 01/31/15                     <NA>
 4:  D         I 01/31/15                     <NA>
 5:  A         I 02/28/15                        G
 6:  B         H 02/28/15                        H
 7:  C         Y 02/28/15                        I
 8:  D         I 02/28/15                        I
 9:  A         I 03/31/15                        I
10:  B         I 03/31/15                        H
11:  C         I 03/31/15                        Y
12:  D         I 03/31/15                        I

Upvotes: 1

tester
tester

Reputation: 1692

library(data.table)

id <- c("A","B","C","D","A","B","C","D","A","B","C","D")
qualifier <- c("G","H","I","I","I","H","Y","I","I","I","I","I")
date <- c("01/31/15","01/31/15",'01/31/15','01/31/15','02/28/15', "02/28/15", "02/28/15","02/28/15","03/31/15","03/31/15","03/31/15","03/31/15")
sample.data <- data.frame(id,qualifier,date)

setDT(sample.data)

sample.data[, qualifier.previous.month := shift(.SD, 1, 0, "lag"), by = id]
> sample.data
    id qualifier     date qualifier.previous.month
 1:  A         G 01/31/15                        0
 2:  B         H 01/31/15                        0
 3:  C         I 01/31/15                        0
 4:  D         I 01/31/15                        0
 5:  A         I 02/28/15                        G
 6:  B         H 02/28/15                        H
 7:  C         Y 02/28/15                        I
 8:  D         I 02/28/15                        I
 9:  A         I 03/31/15                        I
10:  B         I 03/31/15                        H
11:  C         I 03/31/15                        Y
12:  D         I 03/31/15                        I

Upvotes: 0

Related Questions