Reputation: 792
I have dataset "data" with 7 rows and 4 columns, as follows:
var1 var2 var3 var4
A C
A C B
B A C D
D B
B
D B
B C
I want to create following table "Mat" based on the data I have:
A B C D
1 1
1 1 1
1 1 1 1
1 1
1
1 1
1 1 1
Basically, I have taken unique elements from the original data and create a matrix "Mat" where number of rows in Mat=number of rows in Data and number of columns in "Mat"=number of unique elements in Data (that is, A, B, C, D)
I wrote following code in R:
rule <-c("A","B","C","D")
mat<-matrix(, nrow = dim(data)[1], ncol = dim(rule)[1])
mat<-data.frame(mat)
x<-rule[,1]
nm<-as.character(x)
names(mat)<-nm
n_data<-dim(data)[1]
for(i in 1:n_data)
{
for(j in 2:dim(data)[2])
{
for(k in 1:dim(mat)[2])
{
ifelse(data[i,j]==names(mat)[k],mat[i,k]==1,0)
}
}
}
I am getting all NA in "mat". Also, the running time is too much because in my original data set I have 20,000 rows and 100 columns in "Mat".
Any advice will be highly appreciated. Thanks!
Upvotes: 1
Views: 411
Reputation: 323366
By using table
and rep
table(rep(1:nrow(df),dim(df)[2]),unlist(df))
A B C D
1 1 0 1 0
2 1 1 1 0
3 1 1 1 1
4 0 1 0 1
5 0 1 0 0
6 0 1 0 1
7 0 1 1 0
Upvotes: 3
Reputation: 5211
This should be faster than the nested for
loops:
> sapply(c("A", "B", "C", "D"), function(x) { rowSums(df == x, na.rm = T) })
# A B C D
# [1,] 1 0 1 0
# [2,] 1 1 1 0
# [3,] 1 1 1 1
# [4,] 0 1 0 1
# [5,] 0 1 0 0
# [6,] 0 1 0 1
# [7,] 0 1 1 0
Data
df <- read.table(text = "var1 var2 var3 var4
A C NA NA
A C B NA
B A C D
D B NA NA
NA B NA NA
D B NA NA
B C NA NA", header = T, stringsAsFactors = F)
Upvotes: 3