Reputation:
Consider a data frame with two numeric columns (x
, y
) and a categorical column (category
) containing a string:
d1 <- data.frame(x = c(0, 1, 2, 5, 6.5, 8), y = c(0, 2, 3, 5, 5.5, 5), category = "A")
d2 <- data.frame(x = c(0, 1, 2, 4, 6, 8), y = c(0, 3, 3.5, 4, 4, 5), category = "B")
d3 <- data.frame(x = c(0, 1.5, 3, 4, 6.5, 8), y = c(0, 2.5, 3, 4, 4.5, 5), category = "C")
df <- rbind(d1, d2, d3)
> df
x y category
1 0.0 0.0 A
2 1.0 2.0 A
3 2.0 3.0 A
4 5.0 5.0 A
5 6.5 5.5 A
6 8.0 5.0 A
7 0.0 0.0 B
8 1.0 3.0 B
9 2.0 3.5 B
10 4.0 4.0 B
11 6.0 4.0 B
12 8.0 5.0 B
13 0.0 0.0 C
14 1.5 2.5 C
15 3.0 3.0 C
16 4.0 4.0 C
17 6.5 4.5 C
18 8.0 5.0 C
Based on the order of the sum of the last 3 y
values for each distinct category
value, I would like to append a number plus a dot as a prefix to the values of the category column. For the above example, those sums in ascending order would be 13
, 13.5
and 15.5
for B
, C
and A
respectively. The number to be prefixed to the categorical column values should correspondingly represent this order in accordance with the category
label, i.e. 1.B
, 2.C
, 3.A
should be the renamed values in category
for the former B
, C
and A
respectively. My desired output is shown below:
x y category
1 0.0 0.0 3.A
2 1.0 2.0 3.A
3 2.0 3.0 3.A
4 5.0 5.0 3.A
5 6.5 5.5 3.A
6 8.0 5.0 3.A
7 0.0 0.0 1.B
8 1.0 3.0 1.B
9 2.0 3.5 1.B
10 4.0 4.0 1.B
11 6.0 4.0 1.B
12 8.0 5.0 1.B
13 0.0 0.0 2.C
14 1.5 2.5 2.C
15 3.0 3.0 2.C
16 4.0 4.0 2.C
17 6.5 4.5 2.C
18 8.0 5.0 2.C
I'm able to collect the ordered sum values with corresponding category
values in a list, but am unable to think past this:
endsum <- list()
diff <- unique(df$category)
for(i in 1:length(diff)) {
endsum[diff[i]] <- sum(tail(subset(df, category == diff[i])$y, 3))
}
endsum <- endsum[order(sapply(endsum, function(a) a))]
> endsum
$B
[1] 13
$C
[1] 13.5
$A
[1] 15.5
I did think of comparing names(endsum)[i]
with df$category
while traversing df
row-wise and for matches, append the number maintained in the list order along with a dot, i.e. for e.g. when B
from names(endsum)[1]
gets matched with the data frame's category label B
, or rows 7-12 of df
, it will take its order in the list i
which is 1, and prefix it (with a dot) to rows 7-12 of df
, turning the B
into 1.B
.
However, am unsure of its implementation and this does not seem to be the best approach. Could anyone suggest a better way to achieve this? Help will be appreciated!
Upvotes: 3
Views: 191
Reputation: 26218
A one liner base R solution. Since there is no ties.method = "dense"
in rank()
in r, I first converted it to factor and then as.integer
to return the dense ranks.
df$category <- paste(as.integer(factor(rank(ave(df$y, df$category, FUN = function(x){n <- length(x); sum(x[(n-2):n])}),
ties.method = "min"))), df$category, sep = ".")
df
x y category
1 0.0 0.0 3.A
2 1.0 2.0 3.A
3 2.0 3.0 3.A
4 5.0 5.0 3.A
5 6.5 5.5 3.A
6 8.0 5.0 3.A
7 0.0 0.0 1.B
8 1.0 3.0 1.B
9 2.0 3.5 1.B
10 4.0 4.0 1.B
11 6.0 4.0 1.B
12 8.0 5.0 1.B
13 0.0 0.0 2.C
14 1.5 2.5 2.C
15 3.0 3.0 2.C
16 4.0 4.0 2.C
17 6.5 4.5 2.C
18 8.0 5.0 2.C
Upvotes: 1
Reputation: 168
Another soluntion using data.table:
library(data.table)
dt <- data.table(df)
# Create sum of last 3 rows and rank the sum
last3 <- dt[, .SD[c(.N-2, .N-1, .N)], by="category"]
order <- last3[ , .(endsum=sum(y)), by="category"][ , rank := rank(endsum)]
# Merge rank to dt and create prefix
dt_final <- merge(dt, order, by="category")
dt_final[ , category := paste0(rank, ".", category)]
Upvotes: 0
Reputation: 389012
For each group keep the last 3 rows and sum
their y
values, arrange
the data based on the sum and paste the row number to the category
column.
library(dplyr)
df %>%
group_by(category) %>%
slice_tail(n = 3) %>%
summarise(y = sum(y)) %>%
arrange(y) %>%
mutate(new_category = paste(row_number(), category, sep = '.')) %>%
select(-y) %>%
left_join(df, by = 'category')
# category new_category x y
# <chr> <chr> <dbl> <dbl>
# 1 B 1.B 0 0
# 2 B 1.B 1 3
# 3 B 1.B 2 3.5
# 4 B 1.B 4 4
# 5 B 1.B 6 4
# 6 B 1.B 8 5
# 7 C 2.C 0 0
# 8 C 2.C 1.5 2.5
# 9 C 2.C 3 3
#10 C 2.C 4 4
#11 C 2.C 6.5 4.5
#12 C 2.C 8 5
#13 A 3.A 0 0
#14 A 3.A 1 2
#15 A 3.A 2 3
#16 A 3.A 5 5
#17 A 3.A 6.5 5.5
#18 A 3.A 8 5
Probably this one is shorter with the same logic :
library(dplyr)
df %>%
group_by(category) %>%
mutate(sum_y = sum(tail(y, 3))) %>%
ungroup %>%
arrange(sum_y) %>%
mutate(category = paste(match(category, unique(category)),category, sep = '.'))
In base R this can be written as :
df1 <- aggregate(y~category, df, function(x) sum(tail(x, 3)))
df1 <- df1[order(df1$y), -2, drop = FALSE]
df1$new_category <- paste(seq(nrow(df1)), df1$category, sep = '.')
merge(df1, df, by = 'category')
Upvotes: 0