Reputation: 19
I have several duplicate entries in a data set. I want to combine and then add the end column together. See below.
What I have:
Main | FLOW22016| FLOW2| Forest Lakes| 2016| 2016-10-03| 0| Creek chub| Semotilus atromaculatus| 1|
Main | FLOW22016| FLOW2| Forest Lakes| 2016| 2016-10-03| 0| Creek chub| Semotilus atromaculatus| 1|
What I want
Main | FLOW22016| FLOW2| Forest Lakes| 2016| 2016-10-03| 0| Creek chub| Semotilus atromaculatus | 2|
Is this possible? I have many data points like this throughout and I would like to combine.
Upvotes: 1
Views: 87
Reputation: 5788
Long form generic Base R to keep unique non-numeric vectors' values and sum the numeric vectors:
# Function to group data by key: .grouping_func => function
.grouping_func <- function(vec){
# Calculate the run length encoding: r_l_e => rle
r_l_e <- rle(vec)
# Expand it out into the rle_id: rle_id => integer vector
rle_id <- rep(
seq_along(r_l_e$values),
times = r_l_e$lengths
)
# Explicitly define the return object rle_id => GlobalEnv
return(rle_id)
}
# Function to resolve numeric vectors in a data.frame
# resolve_num_vecs => function
resolve_num_vecs <- function(df){
# Resolve which vectors are numeric:
# num_cols => logical vector
num_cols <- setNames(
vapply(
df,
is.numeric,
logical(1)
),
colnames(df)
)
# logical vector => Env
return(num_cols)
}
# Function to combine list of data.frames into df:
# df_list_2_df => function
df_list_2_df <- function(df_list, cmb_func = c(rbind, cbind)){
# Resolve the desired combination function:
# cmb_func_resolved => character scalar
cmb_func_resolved <- match.fun(cmb_func)
# Combine list of data.frames into a data.frame
# using a given combination function: res => data.frame
res <- data.frame(
do.call(
cmb_func_resolved,
df_list
),
row.names = NULL
)
# Explicitly define the returned object:
# data.frame => Env
return(res)
}
# Function to bucket the data.frame
# bucket_df => function
bucket_df <- function(df, logical_vector){
# Apply the grouping function: res => character vector
res <- .grouping_func(
apply(
df[,logical_vector],
1,
toString
)
)
# Explicitly define returned object:
# character vector => Env
return(res)
}
# Function to aggregate data.frame in specified manner:
# agg_func => function
agg_func <- function(df, logical_vector){
# Column bind unique non-numeric columns with
# the sum of numeric columns into a data.frame:
# res => data.frame
res <- cbind(
unique(df[,!logical_vector]),
setNames(
data.frame(
t(
colSums(df[,logical_vector])
)
),
names(logical_vector)[logical_vector]
)
)[,names(df)]
# Explicitly define returned object:
# data.frame => Env
return(res)
}
# Coerce the v6 vector to be of Date type:
# v6 => Date vector
dat$V6 <- as.Date(paste0(dat$V6, "-12-31"), "%Y-%m-%d")
# Resolve which vectors are numeric:
# num_cols => logical vector
num_cols <- resolve_num_vecs(dat)
# Split-apply-combine by group using the
# defined aggregate function: res => data.frame
res <- df_list_2_df(
lapply(
split(
dat,
bucket_df(
dat,
!num_cols
)
),
function(x){
agg_func(x, num_cols)
}
),
rbind
)
Upvotes: 1
Reputation: 72633
First, put the duplicates into bins with increasing numbers, count them and unique
the data frame.
In R>4.1 you could do:
dat <- transform(dat, bin=cumsum(!duplicated(dat[-(ncol(dat))]))) |>
transform(V13=ave(V13, bin, FUN=sum)) |>
unique()
dat
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 bin
# 1 Main FLOW22016 FLOW2 Forest Lakes 2016 2016-10-03 0 Creek chub Semotilus atromaculatus 3 1
# 4 Main FLOW22016 FLOW2 Nile river 2016 2016-10-03 0 Creek chub Chelaethiops bibie 2 2
Data
dat <- read.table(header=TRUE, text='
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13
Main FLOW22016 FLOW2 Forest Lakes 2016 2016-10-03 0 Creek chub Semotilus atromaculatus 1
Main FLOW22016 FLOW2 Forest Lakes 2016 2016-10-03 0 Creek chub Semotilus atromaculatus 1
Main FLOW22016 FLOW2 Forest Lakes 2016 2016-10-03 0 Creek chub Semotilus atromaculatus 1
Main FLOW22016 FLOW2 Nile river 2016 2016-10-03 0 Creek chub Chelaethiops bibie 1
Main FLOW22016 FLOW2 Nile river 2016 2016-10-03 0 Creek chub Chelaethiops bibie 1
')
Upvotes: 1
Reputation: 71560
You mean:
library(dplyr)
df %>%
group_by(lastcolumn) %>%
mutate(lastcolumn = sum(lastcolumn))
Upvotes: 0
Reputation: 15123
Try,
library(dplyr)
df %>%
group_by(across(c( - lastcolumn))) %>%
summarise(lastcolumn = n())
where lastcolumn
is the column that was 1 and needed to become two.
Upvotes: 1