Reputation: 559
I have a large dataframe with 2 rows and 30406 columns. I need to count when the number of times a 0 is present in both rows in a given column (match) and the number of times a 0 is present in one row and not in the other given a column (no match).
I think if I just loop through everything and compare each column it will take too long given that there are >30k columns
head(to_compare)[1:5]
bin:82154:182154 bin:82154:282154 bin:82154:382154
bin:82154:482154
1-D1.txt 0 1 2
0
1-D2.txt 1 1 1
1
bin:82154:582154
1-D1.txt 0
1-D2.txt 0
output
match
1
no_match
1
Upvotes: 0
Views: 107
Reputation: 39647
set.seed(7)
n <- 30406
to_compare <- data.frame(matrix(floor(runif(n*2, 0, 3)), nrow = 2))
table(colSums(to_compare==0))
# 0 1 2
#13519 13513 3374
#
#0..no zero in column (13519)
#1..one row in column has a zero (13513)
#2..both rows in column are zero (3374)
system.time(table(colSums(to_compare==0)))
# User System verstrichen
# 0.332 0.000 0.330
Upvotes: 1
Reputation: 1253
A different and very simple approach would be to first switch columns to rows and then just use rowSums
:
#Create sample df
df <- data.frame(col1 = c(0,1), col2 = c(1,0), col3 = c(1,1), col4 = c(0,2), col5 = c(3,0), col6 = c(0,0))
#Convert columns to rows
df_long <- t(df)
#Count number of 0s in every row and show in table of 0, 1 or 2 zeros
table(rowSums(df_long == 0))
0 1 2
1 4 1
Upvotes: 1
Reputation: 5138
You could use colSums
for a vectorized solution:
set.seed(123)
df <- as.data.frame(matrix(round(runif(50, 0, 2)), nrow = 2))
# Match
sum(colSums(df==0) == 2)
[1] 2
# No match
sum(colSums(df==0) == 1)
[1] 8
Upvotes: 3
Reputation: 469
set.seed(123)
df <- as.data.frame(matrix(round(runif(10, 0, 2)), nrow = 2))
# Count the number of 0 for each column
sum(apply(df, 2, function(x) all(x == 0))) # Match
# Count the number of 0 is present in one row and not in another for each column
sum(apply(df, 2, function(x) any(x == 0) & (x[1] != x[2]))) # No match
Upvotes: 1