Reputation: 367
I have 2 data frames that I'm trying to keep separate, but would like to calculate a new column in df1 based on the means of rows in df2:
df1:
Name <- c("t1", "t2", "t3")
Diam <- c(95, 104, 72)
Rad <- c(47.5, 52, 36)
df1 <- data.frame(Name, Diam, Rad)
df2:
Name <- c("t1A", "t1B", "t2A", "t2B", "t2C", "t3B")
Width <- c(4000, 3200, 2300, 2700, 2100, 3500)
df2 <- data.frame(Name, Width)
I'd like to add a fourth column to df1 that calculates the means of "Width" in df2 when the first 2 characters of "Name" are identical. My intended output looks like:
df3:
Name <- c("t1", "t2", "t3")
Diam <- c(95, 104, 72)
Rad <- c(47.5, 52, 36)
WidthMean <- c(3600, 2366.667, 3500)
df3 <- data.frame(Name, Diam, Rad, WidthMean)
I can do it by splitting the "Name" column in df2, then finding the mean according to the new split name, then adding the mean values from df2 as a new column to df1. However, I have to do this to approximately 75 data.frames and was hoping to find a faster way to process the data (and with fewer lines of code!).
Current code:
df2$"NameID" <- df2$Name
df2$NameID <- gsub(".?$", "", df2$NameID)
df2out <- aggregate(Width ~ NameID, data = df2, FUN = mean)
df1$"WidthMean" <- df2out$Width
df1
Upvotes: 2
Views: 2133
Reputation: 3622
Here's a solution using dplyr
and a loop to read in & merge your files
library(dplyr)
library(gdata)
setwd('TO_THE_FOLDER_WHERE_ALL_YOUR_FILES_ARE')
names <- list.files()
names <- sub('dbh.csv', '', names)
names <- sub('bai.csv', '', names)
uniqueNames <- unique(names)
for(uniqueName in uniqueNames){
df1_name <- paste0(uniqueName, 'dbh.csv')
df1 <- read.csv(df1_name)
df2_name <- paste0(uniqueName, 'bai.csv')
df2 <- read.csv(df2_name)
df2 %>%
mutate(partialName = str_sub(Name, 1, 2)) %>%
group_by(partialName) %>%
dplyr::summarise(meanWidth = mean(Width)) %>%
merge(df1, by.x = 'partialName', by.y = 'Name') %>%
select(partialName, Diam, Rad, meanWidth) %>%
rename.vars('partialName', 'Name')
assign(df2, uniqueName)
#write.csv(df2, paste0(uniqueName, '.csv'), row.names = F)
}
Upvotes: 2
Reputation: 6749
If you want to use basic R, you can write a simple function that does this and then use a for
loop (or potentially apply
) to do this.
func.widthmeans <- function(prefix,target.df) {
active.df <- get(target.df)
return(mean(active.df[grep(pattern=prefix,x=active.df$Name),"Width"]))
}
for(x in df1$Name) {
df1[df1$Name==x,"MeanWidth"] <- func.widthmeans(prefix=x,target.df="df2")
}
rm(x)
df1
Name Diam Rad MeanWidth
1 t1 95 47.5 3600.000
2 t2 104 52.0 2366.667
3 t3 72 36.0 3500.000
This is a more generalized solution that is a function that accepts two data frames, one with the general names and one with the width data, like you described in the comments.
func.widthmeans <- function(data.df,width.df) {
for(x in data.df$Name) {
data.df[data.df$Name==x,"MeanWidth"] <- mean(width.df[grep(pattern=x,x=width.df$Name),"Width"])
}
rm(x)
return(data.df)
}
func.widthmeans(data.df = df1,width.df = df2)
Name Diam Rad MeanWidth
1 t1 95 47.5 3600.000
2 t2 104 52.0 2366.667
3 t3 72 36.0 3500.000
Upvotes: 1
Reputation: 124734
Here's a solution to get df3
using sapply
:
df3 <- df1
df3$WidthMean <- sapply(df1$Name, function(name) {
pattern = paste('^', name, sep='')
mean(subset(df2, grepl(pattern, Name))$Width)
})
df3
Outputs:
Name Diam Rad WidthMean
1 t1 95 47.5 3600.000
2 t2 104 52.0 2366.667
3 t3 72 36.0 3500.000
If you have 75 dfs that look like df1
, you can use lapply
to do them all:
lapply(list(df1, df1), function(df) {
df_out <- df
df_out$WidthMean <- sapply(df$Name, function(name) {
pattern = paste('^', name, sep='')
mean(subset(df2, grepl(pattern, Name))$Width)
})
df_out
})
Outputs:
[[1]]
Name Diam Rad WidthMean
1 t1 95 47.5 3600.000
2 t2 104 52.0 2366.667
3 t3 72 36.0 3500.000
[[2]]
Name Diam Rad WidthMean
1 t1 95 47.5 3600.000
2 t2 104 52.0 2366.667
3 t3 72 36.0 3500.000
Upvotes: 2