Reputation: 27
I'd like to replace the first df in the example below with rows from dataframe two based on ID column. For example: suppose person X have 100 items on dataframe 1, but when we look at dataframe two, we see that he actually only have 50 items and other 50 are for person Z , so in the final result , we should have a row for person X with 50 item and another row for person Z with 50 item , both have same ID.
Dataframe 1
ID Name Status Items
16 Amy B Closed 100
10 Erik C Closed 80
14 Paul R Closed 20
17 Chris K Closed 40
19 Ali I Closed 60
22 Jenny A Closed 40
Dataframe 2
ID Name Items
14 Paul R 10
14 Sarah K 10
22 Jenny A 30
22 Brian L 10
results
ID Name Status Items
16 Amy B Closed 100
10 Erik C Closed 80
14 Paul R Closed 10
14 Sarah K Closed 10
17 Chris K Closed 40
19 Ali I Closed 60
22 Jenny A Closed 30
22 Brian L Closed 10
Upvotes: 1
Views: 55
Reputation: 1904
It looks like you're doing some merges here, and giving priority to the values for "Items" that are in data frame 2.
Try the code below which uses dplyr
package and left_join()
and full_join()
.
Loading the Data...
df1 <- read.table(header=TRUE, stringsAsFactors = FALSE, text=
'ID Name Status Items
16 Amy_B Closed 100
10 Erik_C Closed 80
14 Paul_R Closed 20
17 Chris_K Closed 40
19 Ali_I Closed 60
22 Jenny_A Closed 40')
df2 <- read.table(header = TRUE, stringsAsFactors = FALSE, text =
"ID Name Items
14 Paul_R 10
14 Sarah_K 10
22 Jenny_A 30
22 Brian_L 10")
Merging the tables
# add the status column to df2
df <- left_join(df2, df1 %>% select(ID, Status), by = 'ID')
# ID Name Items Status
# 14 Paul_R 10 Closed
# 14 Sarah_K 10 Closed
# 22 Jenny_A 30 Closed
# 22 Brian_L 10 Closed
# combine both data frames by merging for both ID and Name
df <- full_join(df, df1,
by = c('ID', 'Name', 'Status'),
suffix = c('.1', '.2'))
# ID Name Items.1 Status Items.2
# 14 Paul_R 10 Closed 20
# 14 Sarah_K 10 Closed NA
# 22 Jenny_A 30 Closed 40
# 22 Brian_L 10 Closed NA
# 16 Amy_B NA Closed 100
# 10 Erik_C NA Closed 80
# 17 Chris_K NA Closed 40
# 19 Ali_I NA Closed 60
# create a new column which selects the df2 value if that exists, otherwise uses df1 value
df <- df %>%
mutate(Items = ifelse(is.na(Items.1), Items.2, Items.1)) %>%
select(-Items.1, -Items.2)
# ID Name Status Items
# 14 Paul_R Closed 10
# 14 Sarah_K Closed 10
# 22 Jenny_A Closed 30
# 22 Brian_L Closed 10
# 16 Amy_B Closed 100
# 10 Erik_C Closed 80
# 17 Chris_K Closed 40
# 19 Ali_I Closed 60
Putting it all together...
left_join(df2, df1 %>% select(ID, Status), by = 'ID') %>%
full_join(df1,
by = c('ID', 'Name', 'Status'),
suffix = c('.1', '.2')) %>%
mutate(Items = ifelse(is.na(Items.1), Items.2, Items.)) %>%
select(-Items.1, -Items.2)
Gives the following table as output:
ID Name Status Items
14 Paul_R Closed 10
14 Sarah_K Closed 10
22 Jenny_A Closed 30
22 Brian_L Closed 10
16 Amy_B Closed 100
10 Erik_C Closed 80
17 Chris_K Closed 40
19 Ali_I Closed 60
Upvotes: 1
Reputation: 47300
Assuming your real data is as regular as your sample data, you have redundant information, the important information is:
df1
df2
df3
So what we do is first we add the Status
info to df2 (merge(df2,df1[c(1,3)])
), then we rbind
the relevant item info from df1
and df2
.
rbind(df1[!df1$ID%in% df2$ID,],merge(df2,df1[c(1,3)]))
# ID Name Status Items
# 1 16 Amy B Closed 100
# 2 10 Erik C Closed 80
# 4 17 Chris K Closed 40
# 5 19 Ali I Closed 60
# 11 14 Paul R Closed 10
# 21 14 Sarah K Closed 10
# 3 22 Jenny A Closed 30
# 41 22 Brian L Closed 10
data
df1 <- read.table(text="ID Name Status Items
16 'Amy B ' Closed 100
10 'Erik C ' Closed 80
14 'Paul R ' Closed 20
17 'Chris K' Closed 40
19 'Ali I ' Closed 60
22 'Jenny A' Closed 40",h=T,strin=F)
df2<- read.table(text="ID Name Items
14 'Paul R ' 10
14 'Sarah K' 10
22 'Jenny A' 30
22 'Brian L' 10",h=T,strin=F)
Upvotes: 0