Remove duplicate rows in one column based on another column and keep other columns intact

Question

I have tried a lot of solutions found here and none have seamed to work correctly; unique function got me closest. My data looks like:

id   second   var1   var2
100   20       3      4
100   21       3      3
100   22       4      3
100   23       4      3
100   24       4      4 
100   22       3      3
100   23       3      3

It repeats about ten seconds usually every 300 or so seconds. Each session is around 1200 seconds. I would like delete duplicate seconds within a session and take the mean of whatever is being collapsed in var1 and var 2 or if not the mean keeping either original value is OK. Everything I have tried only removes duplicates if var1 and var2 are non unique?

Preston · Accepted Answer

This will create a new dataframe with the requirements that you asked for.

To explain, you don't actually need to delete anything, you just need to group the val1/2s by the common values, in this case id and second.

library(tidyverse)

new_df <- df %>%
  group_by(id, second) %>%
  summarise(var1 = mean(var1),
            var2 = mean(var2)
            )

Remove duplicate rows in one column based on another column and keep other columns intact

Answers (2)

Related Questions