WoolyThomas
WoolyThomas

Reputation: 47

Apply function to column in list using values in separate column

I'm trying to apply a function to a list of two-dimensional data.

The data I am working on takes measurements over time from many probes. I apply a time index to the matrix that resets when the probe changes.

I have achieved this by transforming the list into individual dataframes, however, I would like to use something from the lapply() family to achieve this as my dataset grows.

This is the individual matrix approach that works:

source = c(1,1,1,2,2,2,3,3,3,4,4,4)
df1 = data.frame(source)
df1$elapsedTime <- (ave(df1$source, df1$source,  FUN = seq_along))

df
# source elapsedTime
# 1       1           1
# 2       1           2
# 3       1           3
# 4       2           1
# 5       2           2
# 6       2           3
# 7       3           1
# 8       3           2
# 9       3           3
# 10      4           1
# 11      4           2
# 12      4           3

I would like to use a function from Map family for this process over a list of similar matrices from different experiments.

Upvotes: 0

Views: 49

Answers (2)

Bulat
Bulat

Reputation: 6969

I think that should give you a base for desired lapply code:

source = c(1,1,1,2,2,2,3,3,3,4,4,4)
df.in = data.frame(source)

df.list <- split(df.in, f = df$source)
res <- lapply(df.list, function(df){
  df$elapsedTime <- seq_along(1:length(df$source))
  return(df)
})
df.out <- bind_rows(res)

df.out
# source elapsedTime
# 1       1           1
# 2       1           2
# 3       1           3
# 4       2           1
# 5       2           2
# 6       2           3
# 7       3           1
# 8       3           2
# 9       3           3
# 10      4           1
# 11      4           2
# 12      4           3

Note that data.table package has dedicated functions for this as well, which can be handy for larger datasets. Also if you just want to do some calculation within a group it is simpler to use data.table for that:

library(data.table)
dt = data.table(source)
dt[, elapsedTime := 1:.N, by = source]

Upvotes: 1

missuse
missuse

Reputation: 19716

If I understand correctly your data is a list of data frames as in the example posted. If that is the case:

Data:

lis = list(df1 = data.frame(source = c(1,1,1,2,2,2,3,3,3,4,4,4)),
          df2 = data.frame(source = rep(1:5, each = 4)))

Function:

lapply(lis, function(x){
  elapsedTime = ave(x[,1], x[,1],  FUN = seq_along)
  return(data.frame(x, elapsedTime))
}
)

If I am mistaken please comment.

Upvotes: 1

Related Questions