baldirony
baldirony

Reputation: 11

Easy way to add observations to an existing dataframe?

I have an existing dataframe to which I would like to add updated observations. I can identify these updated observations by an ID and a time point variable. I've tried removing the outdated observations from the existing dataframe and then tried using the merge() function to merge with a dataframe with just the updated observations, but I get duplicated columns. Is there an elegant way to do this (particularly using dplyr?)

Here's an example of what I'd like to do: Let's say I have a df, called practice

practice

ID     Time  score 1 score 2 
 1   hour 1        3       7
 1   hour 2        4       2
 2   hour 1        3       4

Let's say I want to change the score 1 variable for third observation (for which ID==2 and Time=="hour 1"), from 3 to 5.

What I've tried is making a new dataframe, called practice1:

ID     Time  score 1  score 2 
 1   hour 1        3        7
 1   hour 2        4        2

Which removes the third observation, and then creating another new dataframe with the corrected observations, called practice2:

   ID     Time  score 1  score 2 
    2   hour 1        3        4

I've then tried to do something like this:

Practice3 <- merge(practice2, practice1, by = "ID", all = T)

However, I'll get duplicate columns, and when I try to include multiple variables in the by= statement in the merge function, I get this error:

Error in fix.by(by.x, x) : 'by' must specify a uniquely valid column

Which may be due to the longitudinal nature of the data?

Thanks

Upvotes: 0

Views: 1893

Answers (3)

thc
thc

Reputation: 9705

You can do in place substitution on a variable in a data frame. E.g.:

practice[["Score 1"]][practice$ID == 2 & practice$Time=="hour 1"] <- 5

Upvotes: 1

alistaire
alistaire

Reputation: 43334

If you already have the new data in a data.frame, you can use anti_join to take out old cases and then just use bind_rows to add the new cases:

library(dplyr)

practice <- read.table(text = 'ID     Time  score1 score2 
                                1    hour1       3      7
                                1    hour2       4      2
                                2    hour1       3      4', 
                       header = TRUE, stringsAsFactors = FALSE)

practice2 <- read.table(text = 'ID     Time  score1  score2 
                                 2    hour1       5       5', 
                        header = TRUE, stringsAsFactors = FALSE)

practice %>% 
    anti_join(practice2, by = c('ID', 'Time')) %>% 
    bind_rows(practice2)

#>   ID  Time score1 score2
#> 1  1 hour2      4      2
#> 2  1 hour1      3      7
#> 3  2 hour1      5      5

However, that won't work well if practice2 is missing columns, in which case you can use coalesce to overwrite old values with new ones:

left_join(practice, practice2, by = c('ID', 'Time')) %>% 
    mutate(score1 = coalesce(score1.y, score1.x), 
           score2 = coalesce(score2.y, score2.x)) %>% 
    select(-contains('.'))

#>   ID  Time score1 score2
#> 1  1 hour1      3      7
#> 2  1 hour2      4      2
#> 3  2 hour1      5      5

Upvotes: 0

neilfws
neilfws

Reputation: 33772

Here's an update using dplyr::mutate. Note: I renamed columns to remove spaces.

library(dplyr)
practice %>% 
  mutate(score1 = ifelse(ID == 2 & Time == "hour 1", 5, score1))

Upvotes: 0

Related Questions