Reputation: 99
Good evening! I have the following data.frame from which I would like to extract rows for which the qualifier changes from one month to the next. I have no idea how to approach this, unfortunately...
Data:
id <- c("A","B","C","D","A","B","C","D","A","B","C","D")
qualifier <- c("G","H","I","I","I","H","Y","I","I","I","I","I")
date <- c("01/31/15","01/31/15",'01/31/15','01/31/15','02/28/15', "02/28/15", "02/28/15","02/28/15","03/31/15","03/31/15","03/31/15","03/31/15")
sample.data <- data.frame(id,qualifier,date)
The "id" is unique for any given date but may or may not exist on other dates. Each "id" has a qualifier "I" or "H". All dates are month-end. I would like to extract all rows for which the id's qualifier changes from either "I" or "G" to "H" or "Y" from one month to the next. So, to specify, only changes from 01/31/15 to 02/28/15 would be relevant, not changes from 01/31/15 to 03/31/15. Given the data.frame above, the change I would be interested in would be row 7, when the qualifer for "id" C changes from "I" (from row 3) to "Y" . Ideally, the extracted row 7 would have the value of the qualifier from row 3 as well.
The result should look something like this:
id qualifier date qualifier.previous.month
C H 02/28/15 I
Thanks in advance for any help!
Upvotes: 0
Views: 271
Reputation: 61
# Change the date to a Date format
sample.data$date <- as.Date(sample.data$date, format = "%m/%d/%y")
# Collect the month value from the date
sample.data$month <- as.numeric(format(sample.data$date, "%m"))
# Create a copy of the sample dataframe
sample.data_shifted <- sample.data
# Shift the months by one in the copied dataframe
sample.data_shifted$month <- sample.data_shifted$month + 1
# Merge the dataframes, so each ID/month row has the current and previous month qualifier
sample.data_merged <- merge(
sample.data_shifted, sample.data,
by = c("id", "month"),
suffixes = c(".previous.month", ".current.month")
)
# Tidying up dates
sample.data_merged$date <- sample.data_merged$date.current.month
sample.data_merged <- subset(sample.data_merged, select = -c(date.previous.month, date.current.month))
# We create a logical vector telling us which qualifiers changed between which months
previous_month_ig <- sample.data_merged$qualifier.previous.month %in% c("I", "G")
current_month_hy <- sample.data_merged$qualifier.current.month %in% c("H", "Y")
# Now we only look at IDs/months where the qualifier changed from I/G to H/Y
sample.data_final <- sample.data_merged[previous_month_ig & current_month_hy, ]
sample.data_final
id month qualifier.previous.month qualifier.current.month date
5 C 2 I Y 2015-02-28
This should fit your specifications - it's a bit lengthy to make it clearer. The idea here is to merge back onto your dataframe with the months shifted by one. This should work if you only have one date per month.
Upvotes: 2
Reputation: 101508
Perhaps a base R option using ave
within transform
could help
transform(
sample.data,
qualifier.previous.month = ave(qualifier, id, FUN = function(x) c(NA, x[-length(x)]))
)
which gives
id qualifier date qualifier.previous.month
1 A G 01/31/15 <NA>
2 B H 01/31/15 <NA>
3 C I 01/31/15 <NA>
4 D I 01/31/15 <NA>
5 A I 02/28/15 G
6 B H 02/28/15 H
7 C Y 02/28/15 I
8 D I 02/28/15 I
9 A I 03/31/15 I
10 B I 03/31/15 H
11 C I 03/31/15 Y
12 D I 03/31/15 I
A data.table
option using shift
can make it as well
> setDT(sample.data)[, qualifier.previous.month := shift(qualifier), id][]
id qualifier date qualifier.previous.month
1: A G 01/31/15 <NA>
2: B H 01/31/15 <NA>
3: C I 01/31/15 <NA>
4: D I 01/31/15 <NA>
5: A I 02/28/15 G
6: B H 02/28/15 H
7: C Y 02/28/15 I
8: D I 02/28/15 I
9: A I 03/31/15 I
10: B I 03/31/15 H
11: C I 03/31/15 Y
12: D I 03/31/15 I
Upvotes: 1
Reputation: 1692
library(data.table)
id <- c("A","B","C","D","A","B","C","D","A","B","C","D")
qualifier <- c("G","H","I","I","I","H","Y","I","I","I","I","I")
date <- c("01/31/15","01/31/15",'01/31/15','01/31/15','02/28/15', "02/28/15", "02/28/15","02/28/15","03/31/15","03/31/15","03/31/15","03/31/15")
sample.data <- data.frame(id,qualifier,date)
setDT(sample.data)
sample.data[, qualifier.previous.month := shift(.SD, 1, 0, "lag"), by = id]
> sample.data
id qualifier date qualifier.previous.month
1: A G 01/31/15 0
2: B H 01/31/15 0
3: C I 01/31/15 0
4: D I 01/31/15 0
5: A I 02/28/15 G
6: B H 02/28/15 H
7: C Y 02/28/15 I
8: D I 02/28/15 I
9: A I 03/31/15 I
10: B I 03/31/15 H
11: C I 03/31/15 Y
12: D I 03/31/15 I
Upvotes: 0