Reputation: 664
I am trying to run this merge function in R:
nomes <- '0'
dt <- data.frame("variable" = Level12R$level1.2_are.out$parameters$stdyx.standardized[,2])
for(i in 1:length(Level12R)) {
nomes[i] = names(Level12R)[i]
df = eval(parse(text=paste0("Level12R$",nomes[i],"$parameters$stdyx.standardized", collapse=NULL)))
df <- df[,2:3]
names(df)<-c("variable",toupper(substr(nomes[i],10,12)))
dt <- merge(x=dt, y=df, by = "variable", all.x = TRUE)
}
on the following data Level12R.rds. The merge function works, but I get duplicate rows... since the loop goes from 1 to 48 after a few cycles my dt object has millions of observations.
Any idea of what is wrong here? (I also tried dplyr::left_join and the same behavior occurs).
Thanks in advance for any comments.
João
Upvotes: 1
Views: 5822
Reputation: 664
Thanks guys! Yes, the problem was with the duplicate names in the df. So, my solution was:
nomes<-'0'
dt<- data.frame("variable" = paste0(Level12R$level1.2_are.out$parameters$stdyx.standardized[,1],
Level12R$level1.2_are.out$parameters$stdyx.standardized[,2],
Level12R$level1.2_are.out$parameters$stdyx.standardized[,8]))
for (i in 1:length(Level12R)){
nomes[i] = names(Level12R)[i]
df = eval(parse(text=paste0("Level12R$",nomes[i],"$parameters$stdyx.standardized", collapse=NULL)))
df<-df[,2:3]
names(df)<-c("variable",toupper(substr(nomes[i],10,12)))
n1 = eval(parse(text=paste0("Level12R$",nomes[i],"$parameters$stdyx.standardized","[1]", collapse=NULL)))
n2 = eval(parse(text=paste0("Level12R$",nomes[i],"$parameters$stdyx.standardized","[2]", collapse=NULL)))
n3 = eval(parse(text=paste0("Level12R$",nomes[i],"$parameters$stdyx.standardized","[8]", collapse=NULL)))
varnames <- paste0(n1[,],n2[,],n3[,])
df[,1]<-varnames
dt<-merge(x=dt, y=df, by = "variable", all.x = TRUE)
}
## So now a nice summary per variable
dt2 <- data.frame(t(dt[-1]))
colnames(dt2) <- dt[, 1]
skim(dt2)
Upvotes: 0
Reputation: 47300
You have to take more columns from your source table to make sure your observations are uniquely identified.
We first build all these data.frames
and then join them using Reduce
:
dfs <- Map(Level12R, names(Level12R), f = function(x,y){
df <- x$parameters$stdyx.standardized[c(1:3,8)]
setNames(df,c("paramHeader","variable",toupper(substr(y,10,12)),"BetweenWithin"))
})
dt <- Reduce(function(x,y) merge(x,y,all.x=TRUE),dfs)
dt[1:5,1:8]
# paramHeader variable BetweenWithin ARE AUS AUT AZE BGR
# 1 ASRREA.ON ACBG03A Between -0.060 -0.140 -0.369 0.034 -0.002
# 2 ASRREA.ON ACBG12BD Between -0.041 -0.108 -0.003 -0.029 0.159
# 3 ASRREA.ON ACBG13H Between -0.121 -0.143 -0.106 -0.112 -0.011
# 4 ASRREA.ON ACBGDAS Between 0.143 0.112 0.009 -0.053 -0.086
# 5 ASRREA.ON ACBGEAS Between 0.031 0.088 0.116 -0.336 0.359
Upvotes: 1
Reputation: 23608
The problem you have is that your variables are not unique. If you merge them you will get more and more rows. You should have a look what you get when you do:
dt <- data.frame(level12R$level1.2_are.out$parameters$stdyx.standardized[,1:2])
tail(dt)
paramHeader param
30 ASRREA.ON ATBR10CG
31 ASRREA.ON ATBR10DG
32 ASRREA.ON ATBR10FG
33 ASRREA.ON ATBR12AG
34 Intercepts ASRREA
35 Residual.Variances ASRREA
You can see that the last to variables are the same, but come from different headers.
So we have to extend the join so we can make unique records. Looking at the data that would take 3 columns, 1, 2 and 8 "header", "variable" and "betweenwithin". Then we can loop through everything without getting duplicate records. Your dt object ends up with 35 records and 51 variables with NA's where the results were not 35 records but 34 or even 25.
nomes <- '0'
dt <- data.frame(Level12R$level1.2_are.out$parameters$stdyx.standardized[,c(1:2, 8)])
names(dt)<-c("header", "variable", "betweenwithin")
for(i in 1:length(Level12R)) {
nomes[i] = names(Level12R)[i]
df = eval(parse(text=paste0("Level12R$",nomes[i],"$parameters$stdyx.standardized", collapse=NULL)))
df <- df[,c(1:3, 8)]
names(df)<-c("header", "variable", toupper(substr(nomes[i],10,12)), "betweenwithin")
dt <- left_join(x=dt, y=df)
}
Normally I would use a list object in a loop, and later on see what I need to do with the data in the list. It prevents creating unintended side effects when using joins / merges etc.
Upvotes: 2