Dfeld
Dfeld

Reputation: 197

Merging data based on conditions

I have the following data:

Data <- data.frame(Project=c(123,123,123,123,123,123,123,123,124,124,124,124,124,125,125),
               Value=c(1,2,3,4,7,3,8,9,8,3,2,5,6,2,3),
               OldValue=c("","Open","In Progress","Open","In Progress","Complete","Open","In Progress","Complete","Open","In Progress","System Declined","In Progress","","Open"),
               NewValue=c("Open","In Progress","Open","In Progress","Complete","Open","In Progress","Complete","Open","In Progress","System Declined","In Progress","Complete","Open","Complete"))
Data$Request <- ifelse(((Data$OldValue==""|Data$OldValue=="Complete"|Data$OldValue=="System Declined"|Data$OldValue=="In Progress")&Data$NewValue=="Open"),Data$Value,NA)
Data$Start <- ifelse(((Data$OldValue=="Open"|Data$OldValue=="Complete"|Data$OldValue=="System Declined")&Data$NewValue=="In Progress"),Data$Value,NA)
Data$End <- ifelse(((Data$NewValue=="Complete"|Data$NewValue=="System Declined")&(Data$OldValue=="Open"|Data$OldValue=="In Progress")),Data$Value,NA)

I want to be able to determine the number of completed projects with their associated values by unique Project ID. A completed project is one that has both the "Request" and "End" fields populated ("Start" is not a required field).

I'd like to obtain the information using the following criteria:

  1. I'm only interested in capturing the first instance of the "Request", "Start" and "End" fields. Example: top 5 lines of code show two "Request" values, two "Start" values, and one "End" value. I'd want to merge the first instance of each field into one row, so the result would be 1,2,7.

  2. In the same example as above, I'd want the 1,2,7 values to be merged on the first instance of the "NewValue" being "Open", not the second, and not both.

  3. There can be multiple completed projects per uneir Project ID. Example: Project ID 123 should have two completed projects: values 1,2,7 and 3,8,9

This is the result I'm looking for:

Result <- data.frame(Project=c(123,123,124,125),
               Value=c(1,3,8,2),
               OldValue=c("","Complete","Complete",""),
               NewValue=c("Open","Open","Open","Open"),
               Request=c(1,3,8,2),
               Start=c(2,8,3,""),
               End=c(7,9,2,3))

Would really appreciate any help coding this.

Upvotes: 0

Views: 60

Answers (1)

Mouad_Seridi
Mouad_Seridi

Reputation: 2716

it's not clear what the OP wants exactly, but I hope you can tweak this code to obtain the desired result

require(data.table)  
Data <- data.table(Data)
Data2 <- Data[,.(OldValue, NewValue,Request = first(Request[!is.na(Request)]),
          Start = first(Start[!is.na(Start)]),
          End = first(End[!is.na(End)])), 
          .(Project, Value)]

Upvotes: 1

Related Questions