user3254389
user3254389

Reputation: 31

Merging dataframes in R based on different join conditions?

I have 2 dataframes. If SysId in df2 is 0, then i have to merge to df1 by the AppId else i have to merge on the SysId and AppId together and get the Flag.

I have done it in a round about way in two steps with additional function to create a new column based on the two flag columns after merging. i need to merge based on different join conditions and get it in the same column. Also the way i created looks round about, is there a better way? Thanks in advance

df1 = data.frame(
SysId=rep(1001:1003,3),
AppId=c(rep("A",3),rep("B",3),rep("C",3))
                 )

df2 = data.frame(
SysId=c(1002,1003,0),
AppId=c("A","B","C"),
Flag="Y"
)

df1

    SysId AppId
    1  1001     A
    2  1002     A
    3  1003     A
    4  1001     B
    5  1002     B
    6  1003     B
    7  1001     C
    8  1002     C
    9  1003     C

df2

      SysId AppId Flag
    1  1002     A    Y
    2  1003     B    Y
    3     0     C    Y

    Final Expected Result
    SysId AppId       Flag
    1  1001     A          
    2  1002     A          Y
    3  1003     A
    4  1001     B
    5  1002     B
    6  1003     B           Y
    7  1001     C           Y
    8  1002     C           Y
    9  1003     C           Y

df1 <- merge(x=df1,y=df2[df2$SysId == 0, c("AppId","Flag")],by=c("AppId"), all.x=TRUE)
df1 <- merge(x=df1,y=df2,by=c("SysId","AppId"), all.x=TRUE)

    After Merging two times
      SysId AppId Flag.x Flag.y
    1  1001     A   <NA>   <NA>
    2  1001     B   <NA>   <NA>
    3  1001     C      Y   <NA>
    4  1002     A   <NA>      Y
    5  1002     B   <NA>   <NA>
    6  1002     C      Y   <NA>
    7  1003     A   <NA>   <NA>
    8  1003     B   <NA>      Y
    9  1003     C      Y   <NA>

Upvotes: 0

Views: 47

Answers (1)

ddunn801
ddunn801

Reputation: 1890

You could move the logic from the "merge" to a "key" column, then merge normally on this new column, as so:

df1 <- data.frame(SysId=rep(1001:1003,3),AppId=c(rep("A",3),rep("B",3),rep("C",3)),stringsAsFactors=FALSE)
df2 <- data.frame(SysId=c(1002,1003,0),AppId=c("A","B","C"),Flag="Y",stringsAsFactors=FALSE)

# move the condition to the key
df2$key <- ifelse(df2$SysId==0,df2$AppId,paste0(df2$SysId,df2$AppId))
df1$key <- ifelse(df1$AppId %in% df2$AppId[df2$SysId==0],df1$AppId,paste0(df1$SysId,df1$AppId))

# merge data frames
df1 <- merge(x=df1,y=df2,by="key",all.x=TRUE)

# format results
df1 <- df1[,c("SysId.x","AppId.x","Flag")]
colnames(df1) <- c("SysId","AppId","Flag")
df1 <- df1[order(df1$AppId,df1$SysId),]

Upvotes: 1

Related Questions