Catherine
Catherine

Reputation: 161

Check certain number of repeats in dataframe rows

I have want to find the rows in my dataframe that contain 1, 2, 3, 4 and each of them repeated twice. Once find it put a1 in the judge1 column.
The code I have only give me zeros in the judge1 column, even when the row meets the criteria:

a<-c(1, 2, 3, 4, 1, 2, 3, 4)
b<-c(1, 1, 1, 1, 2, 2, 2, 2)
df <- as.data.frame(rbind(a, b), stringsAsFactors = FALSE)

for (i in 1:nrow(df)){
  c<-as.data.frame(table(as.numeric(df[i, ])))
  
  if ( c[1, "Freq"]==2 & c[2, "Freq"]==2 & c[3, "Freq"]==2 & c[4, "Freq"]==2 )
  {df$judge=1}
  else 
  {df$judge=0}
}

I got all the all zeros in the judge1 column in the df. But the first row of judge1 column should be 1.

In the end, I will remove all the rows that do not meet my criteria (the row contain two repeats of 1,2,3,4). If anyone know a way to do it without using the step involve "judge1" column and remove rows when "judge1" column value=0, that could help a lot.

Upvotes: 1

Views: 54

Answers (3)

ThomasIsCoding
ThomasIsCoding

Reputation: 101335

You can try the code below

vec <- 1:4
df$judge <- +(colSums(apply(df, 1, sort) == sort(rep(vec, 2))) == length(df))

which gives

> df
  V1 V2 V3 V4 V5 V6 V7 V8 judge
a  1  2  3  4  1  2  3  4     1
b  1  1  1  1  2  2  2  2     0

Explanation

  • Since you already specify the times of repetition, i.e., twice, of all values 1:4, you can create a vector sort(rep(vec,2)) where values all have two occurrences and sorted in an ascending manner

  • apply(df, 1, sort) sorts rows in a ascending manner as well and apply(df, 1, sort) == sort(rep(vec, 2)) checks the rows are matched with your objective sort(rep(vec, 2))

  • If all values of a row are matched, you will get a column of all TRUEs, and then colSums(...) == length(df) return TRUE for that row.

Upvotes: 0

Rui Barradas
Rui Barradas

Reputation: 76402

Something like this?

vec <- 1:4
apply(df, 1, function(x){
  y <- table(factor(x, levels = vec))
  +all(y == 2 & vec %in% names(y))
})

#a b 
#1 0

And assign this result to the new column.

df$judge <- apply(df, 1, function(x){
  y <- table(factor(x, levels = vec))
  +all(y == 2 & vec %in% names(y))
})

#df
#  V1 V2 V3 V4 V5 V6 V7 V8 judge
#a  1  2  3  4  1  2  3  4     1
#b  1  1  1  1  2  2  2  2     0

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388982

One way using apply :

values_to_check <- 1:4

df$judge <- apply(df, 1, function(x) {
  #count frequency for each unique value
  tab <- table(x)
  #Keep only the values present in values_to_check
  tab <- tab[names(tab) %in% values_to_check]
  #Check if all the values in values_to_check to are present
  #and all those values occur exactly two times
  as.integer(all(values_to_check %in% names(tab)) & all(tab == 2))
})

df

#  V1 V2 V3 V4 V5 V6 V7 V8 judge
#a  1  2  3  4  1  2  3  4     1
#b  1  1  1  1  2  2  2  2     0

Upvotes: 1

Related Questions