Reputation: 3375
I have a the following dataset :
> head(Dataset)
IDCLIENT NUMDOSS PROV_200112 PROV_200212 PROV_200312 ....
H44879 0104957 0 0 0 ....
Z353025 0105289 0 0 0 ....
B416771 0105309 0 0 0 ....
B5043 0105366 0 0 0 ....
A725471 0105370 0 0 0 ....
BE406627 0105371 0 0 0 ....
This a table of 125 columns and 250 000 obs
We have a table with two ID columns ID1, ID2 and the other columns register the values of a variables called Prov in a certain period of time from 2003 to 2017.
What I am trying to do is to transform this table so that I will have only 4 columns ( ID1, ID2, Date(or period) and Prov ) instead of 125 ones. Something like this :
>head(DF)
IDClient IDDossier Date Prov
B416771 0104957 PROV_200110 5
B416771 0104957 PROV_200111 0
B416771 0104957 PROV_200112 99
B416771 0104957 PROV_200212 1,23
H44879 0105289 PROV_200212 36,1
Z353025 0105309 PROV_200312 10436,175
BE4410 0105366 PROV_200406 10438,9
to do so I tried the following code, but it's really time consuming and I also got some errors/ warnings message from R:
# N= dim(ProvTab)[1]*dim(ProvTab)[2]
# DF=data.frame(IDClient=rep("",N),IDDossier=rep("",N),
# Date=rep("",N), Prov=rep("",N), stringsAsFactors = F)
DF=NULL
periodd=as.data.frame(colnames(ProvTab))
start.time= Sys.time() # to count running time !
for (k in 3:ncol(ProvTab))
{
for (j in 1:nrow(ProvTab))
{
DF=rbind(DF,
data.frame(IDClient=ProvTab[j,1], IDDossier=ProvTab[j,2],
Date = periodd[k,1] , Prov=ProvTab[j,k]))
}
}
end.time= Sys.time()
end.time- start.time
I really tried to look for other solutions but failed.
PS: R or SQL codes are welcomed !
Any suggestions ?
Upvotes: 1
Views: 234
Reputation: 7312
I think this is a pretty simple wide to long transformation. The reshape2
package is great for this.
require(reshape2)
DF <- melt(Dataset, id.vars= c("IDCLIENT", "NUMDOSS"))
names(DF) <- c("IDClient", "IDDossier", "Date", "Prov")
If that's slow, you can try with data.table
instead which is probably faster.
require(data.table)
setDT(Dataset)
DF <- melt(Dataset, id.vars= c("IDCLIENT", "NUMDOSS"))
Upvotes: 2