Syed Riaz Mahmood Ali
Syed Riaz Mahmood Ali

Reputation: 105

generate maximum values in a large dataset

I have a data frame named data_frame like this

 period   X      Y   Z
    1     10    20  12
    2     15    14  17
    3     20    35  29
    4     22    45  25
    5     27    16  28
    6     12    20  25

Now I want to generate another df by having the maximum value of first two, second two, third period two where expected output is like this

period  X   Y   Z
  1    15   20  17
  2    22   45  29
  3    27   20  28

Upvotes: 0

Views: 40

Answers (3)

Rui Barradas
Rui Barradas

Reputation: 76450

The following will do what you want.

df2 <- data_frame
df2$fact <- (data_frame$period + 1) %/% 2

df <- aggregate(. ~ fact, df2, max)
rm(df2)
df <- df[, -2]
names(df)[1] <- names(data_frame)[1]
df
#  period  X  Y  Z
#1      1 15 20 17
#2      2 22 45 29
#3      3 27 20 28

DATA

data_frame <-
structure(list(period = 1:6, X = c(10L, 15L, 20L, 22L, 27L, 12L
), Y = c(20L, 14L, 35L, 45L, 16L, 20L), Z = c(12L, 17L, 29L, 
25L, 28L, 25L)), .Names = c("period", "X", "Y", "Z"), class = "data.frame", row.names = c(NA, 
-6L))

EDIT.
If you want periods of length 3 adjust fact accordingly.

df2$fact <- (data_frame$period + 2) %/% 3

More generally, for periods of length n,

df2$fact <- (data_frame$period + n - 1) %/% n

Upvotes: 1

Aramis7d
Aramis7d

Reputation: 2496

with tidyverse, you can use something like:

data_frame %>%
  mutate(k = period) %>%
  mutate(k = if_else( (mod(k,2)==0), lag(k,1), k)) %>%
  group_by(k) %>%
  summarise_all(max)

and then do a select to keep only the required columns.

Upvotes: 0

ottlngr
ottlngr

Reputation: 1257

You can use dplyr and an additional index column to group by and find the maximum:

d <- data_frame(X = sample(1:20, 6, FALSE),
                Y = sample(1:20, 6, FALSE),
                Z = sample(1:20, 6, FALSE))

d
# A tibble: 6 x 4
      X     Y     Z index
  <int> <int> <int> <dbl>
1    15    12    12     1
2    19     5    18     1
3    14     6    20     2
4    20     8    11     2
5     6    13    16     3
6     1    17     9     3

library(dplyr)

d %>% 
  mutate(index = c(1,1,2,2,3,3)) %>% 
  group_by(index) %>% 
  summarise_all(max)

# A tibble: 3 x 4
  index     X     Y     Z
  <dbl> <dbl> <dbl> <dbl>
1     1    19    12    18
2     2    20     8    20
3     3     6    17    16

Upvotes: 1

Related Questions