Reputation: 207
I need to fill a matrix (MA) with information from a long data frame (DF) using another matrix as identifier (ID.MA).
An idea of my three matrices: MA.ID creates an identifier to look in the big DF the needed variables:
a b c
a ID.aa ID.ab ID.ac
b ID.ba ID.bb ID.bc
c ID.ca ID.cb ID.cc
The original big data frame has useless information but has also the rows that are useful for me to fill the target MA matrix:
ID 1990 1991 1992
ID.aa 10 11 12
ID.ab 13 14 15
ID.ac 16 17 18
ID.ba 19 20 21
ID.bb 22 23 24
ID.bc 25 26 27
ID.ca 28 29 30
ID.cb 31 32 33
ID.cc 34 35 36
ID.xx 40 40 55
ID.xy 50 51 45
....
MA should be filled with cross-information. In my example it should look like that for a chosen column of DF (let's say, 1990):
a b c
a 10 13 16
b 19 22 25
c 28 31 34
I've tried to use match but honestly it didn't work out:
MA$a = DF[match(MA.ID$a, DF$ID),2]
I was recommended to use the data.table
package but I couldn't see how that would help me.
Anyone has any good way to approach this problem?
Upvotes: 3
Views: 322
Reputation: 83215
Supposing that your input are dataframes, then you could do the following:
library(data.table)
setDT(ma)[, lapply(.SD, function(x) x = unlist(df[match(x,df$ID), "1990"]))
, .SDcols = colnames(ma)]
which returns:
a b c
1: 10 13 16
2: 19 22 25
3: 28 31 34
Explanation:
setDT(ma)
you transform the dataframe into a datatable (which is an enhanced dataframe)..SDcols=colnames(ma)
you specify on which columns the transformation has to be applied.lapply(.SD, function(x) x = unlist(df[match(x,df$ID),"1990"]))
performs the matching operation on each column specified with .SDcols
.An alternative approach with data.table
is first transforming ma
to a long data.table:
ma2 <- melt(setDT(ma), measure.vars = c("a","b","c"))
setkey(ma2, value) # set key by which 'ma' has to be indexed
setDT(df, key="ID") # transform to a datatable & set key by which 'df' has to be indexed
# joining the values of the 1990 column of df into
# the right place in the value column of 'ma'
ma2[df, value := `1990`]
which gives:
> ma2
variable value
1: a 10
2: b 13
3: c 16
4: a 19
5: b 22
6: c 25
7: a 28
8: b 31
9: c 34
The only drawback of this method is that the numeric values in the 'value' column get stored as character values. You can correct this by extending it as follows:
ma2[df, value := `1990`][, value := as.numeric(value)]
If you want to change it back to wide format you could use the rowid
function within dcast
:
ma3 <- dcast(ma2, rowid(variable) ~ variable, value.var = "value")[, variable := NULL]
which gives:
> ma3
a b c
1: 10 13 16
2: 19 22 25
3: 28 31 34
Used data:
ma <- structure(list(a = structure(1:3, .Label = c("ID.aa", "ID.ba", "ID.ca"), class = "factor"),
b = structure(1:3, .Label = c("ID.ab", "ID.bb", "ID.cb"), class = "factor"),
c = structure(1:3, .Label = c("ID.ac", "ID.bc", "ID.cc"), class = "factor")),
.Names = c("a", "b", "c"), class = "data.frame", row.names = c(NA, -3L))
df <- structure(list(ID = structure(1:9, .Label = c("ID.aa", "ID.ab", "ID.ac", "ID.ba", "ID.bb", "ID.bc", "ID.ca", "ID.cb", "ID.cc"), class = "factor"),
`1990` = c(10L, 13L, 16L, 19L, 22L, 25L, 28L, 31L, 34L),
`1991` = c(11L, 14L, 17L, 20L, 23L, 26L, 29L, 32L, 35L),
`1992` = c(12L, 15L, 18L, 21L, 24L, 27L, 30L, 33L, 36L)),
.Names = c("ID", "1990", "1991", "1992"), class = "data.frame", row.names = c(NA, -9L))
Upvotes: 2
Reputation: 6267
In base R, it can be seen as a job for outer
:
> outer(1:nrow(MA.ID), 1:ncol(MA.ID), Vectorize(function(x,y) {DF[which(DF$ID==MA.ID[x,y]),'1990']}))
[,1] [,2] [,3]
[1,] 10 13 16
[2,] 19 22 25
[3,] 28 31 34
Explanations:
outer
creates a matrix as the outer product of the first argument X
(here a b c
) and the second argument Y
(here the same, a b c
)X
and Y
, it applies a function that looks up the value in DF
, at the row where the ID
is MA.ID[X,Y]
, and at the column 1990
Vectorize
, because outer
expects a vectorized functionAlternatively, another way to do it (still in base R) is:
MA.ID
into a vectorsapply
a quick function that looks up correspondance with DF$ID
This works:
> structure(
sapply(unlist(MA.ID),
function(id){DF[which(DF$ID==id),'1990']}),
dim=dim(MA.ID), names=NULL)
[,1] [,2] [,3]
[1,] 10 13 16
[2,] 19 22 25
[3,] 28 31 34
(here the call to structure(..., dim=dim(MA.ID), names=NULL)
converts back the vector to a matrix)
Upvotes: 2