Reputation: 11
I have an existing dataframe to which I would like to add updated observations. I can identify these updated observations by an ID and a time point variable. I've tried removing the outdated observations from the existing dataframe and then tried using the merge() function to merge with a dataframe with just the updated observations, but I get duplicated columns. Is there an elegant way to do this (particularly using dplyr?)
Here's an example of what I'd like to do: Let's say I have a df, called practice
practice
ID Time score 1 score 2
1 hour 1 3 7
1 hour 2 4 2
2 hour 1 3 4
Let's say I want to change the score 1 variable for third observation (for which ID==2 and Time=="hour 1"), from 3 to 5.
What I've tried is making a new dataframe, called practice1:
ID Time score 1 score 2
1 hour 1 3 7
1 hour 2 4 2
Which removes the third observation, and then creating another new dataframe with the corrected observations, called practice2:
ID Time score 1 score 2
2 hour 1 3 4
I've then tried to do something like this:
Practice3 <- merge(practice2, practice1, by = "ID", all = T)
However, I'll get duplicate columns, and when I try to include multiple variables in the by= statement in the merge function, I get this error:
Error in fix.by(by.x, x) : 'by' must specify a uniquely valid column
Which may be due to the longitudinal nature of the data?
Thanks
Upvotes: 0
Views: 1893
Reputation: 9705
You can do in place substitution on a variable in a data frame. E.g.:
practice[["Score 1"]][practice$ID == 2 & practice$Time=="hour 1"] <- 5
Upvotes: 1
Reputation: 43334
If you already have the new data in a data.frame, you can use anti_join
to take out old cases and then just use bind_rows
to add the new cases:
library(dplyr)
practice <- read.table(text = 'ID Time score1 score2
1 hour1 3 7
1 hour2 4 2
2 hour1 3 4',
header = TRUE, stringsAsFactors = FALSE)
practice2 <- read.table(text = 'ID Time score1 score2
2 hour1 5 5',
header = TRUE, stringsAsFactors = FALSE)
practice %>%
anti_join(practice2, by = c('ID', 'Time')) %>%
bind_rows(practice2)
#> ID Time score1 score2
#> 1 1 hour2 4 2
#> 2 1 hour1 3 7
#> 3 2 hour1 5 5
However, that won't work well if practice2
is missing columns, in which case you can use coalesce
to overwrite old values with new ones:
left_join(practice, practice2, by = c('ID', 'Time')) %>%
mutate(score1 = coalesce(score1.y, score1.x),
score2 = coalesce(score2.y, score2.x)) %>%
select(-contains('.'))
#> ID Time score1 score2
#> 1 1 hour1 3 7
#> 2 1 hour2 4 2
#> 3 2 hour1 5 5
Upvotes: 0
Reputation: 33772
Here's an update using dplyr::mutate
. Note: I renamed columns to remove spaces.
library(dplyr)
practice %>%
mutate(score1 = ifelse(ID == 2 & Time == "hour 1", 5, score1))
Upvotes: 0