Reputation: 45
I have tried a lot of solutions found here and none have seamed to work correctly; unique
function got me closest. My data looks like:
id second var1 var2
100 20 3 4
100 21 3 3
100 22 4 3
100 23 4 3
100 24 4 4
100 22 3 3
100 23 3 3
It repeats about ten seconds usually every 300 or so seconds. Each session is around 1200 seconds. I would like delete duplicate seconds within a session and take the mean of whatever is being collapsed in var1 and var 2 or if not the mean keeping either original value is OK. Everything I have tried only removes duplicates if var1 and var2 are non unique?
Upvotes: 2
Views: 1018
Reputation: 28825
You can do it using base-R
functions:
aggregate(. ~ id+second, dat, mean)
## id second var1 var2
## 1 100 20 3.0 4
## 2 100 21 3.0 3
## 3 100 22 3.5 3
## 4 100 23 3.5 3
## 5 100 24 4.0 4
Using dplyr
package it can be done (similar to previous answer with a little of tweaking):
library(dplyr)
dat %>% group_by(id, second) %>%
summarise_all(mean)
## # A tibble: 5 x 4
## # Groups: id [?]
## id second var1 var2
## <int> <int> <dbl> <dbl>
## 1 100 20 3.0 4
## 2 100 21 3.0 3
## 3 100 22 3.5 3
## 4 100 23 3.5 3
## 5 100 24 4.0 4
If you want to remove duplicates based on two columns (keys), data.table
package would be a good option:
library(data.table)
unique(setDT(dat), by = c('id','second'))
# id second var1 var2
# 1: 100 20 3 4
# 2: 100 21 3 3
# 3: 100 22 4 3
# 4: 100 23 4 3
# 5: 100 24 4 4
Data:
dat <- structure(list(id = c(100L, 100L, 100L, 100L, 100L, 100L, 100L
), second = c(20L, 21L, 22L, 23L, 24L, 22L, 23L), var1 = c(3L,
3L, 4L, 4L, 4L, 3L, 3L), var2 = c(4L, 3L, 3L, 3L, 4L, 3L, 3L)), .Names = c("id",
"second", "var1", "var2"), .internal.selfref = <pointer: 0x02af24a0>, row.names = c(NA,
7L), class = c("data.table", "data.frame"))
Upvotes: 0
Reputation: 8177
This will create a new dataframe with the requirements that you asked for.
To explain, you don't actually need to delete anything, you just need to group the val1/2s by the common values, in this case id and second.
library(tidyverse)
new_df <- df %>%
group_by(id, second) %>%
summarise(var1 = mean(var1),
var2 = mean(var2)
)
Upvotes: 1