Reputation: 161
I have want to find the rows in my dataframe that contain 1, 2, 3, 4 and each of them repeated twice. Once find it put a1 in the judge1
column.
The code I have only give me zeros in the judge1
column, even when the row meets the criteria:
a<-c(1, 2, 3, 4, 1, 2, 3, 4)
b<-c(1, 1, 1, 1, 2, 2, 2, 2)
df <- as.data.frame(rbind(a, b), stringsAsFactors = FALSE)
for (i in 1:nrow(df)){
c<-as.data.frame(table(as.numeric(df[i, ])))
if ( c[1, "Freq"]==2 & c[2, "Freq"]==2 & c[3, "Freq"]==2 & c[4, "Freq"]==2 )
{df$judge=1}
else
{df$judge=0}
}
I got all the all zeros in the judge1 column in the df. But the first row of judge1 column should be 1.
In the end, I will remove all the rows that do not meet my criteria (the row contain two repeats of 1,2,3,4). If anyone know a way to do it without using the step involve "judge1" column and remove rows when "judge1" column value=0, that could help a lot.
Upvotes: 1
Views: 54
Reputation: 101335
You can try the code below
vec <- 1:4
df$judge <- +(colSums(apply(df, 1, sort) == sort(rep(vec, 2))) == length(df))
which gives
> df
V1 V2 V3 V4 V5 V6 V7 V8 judge
a 1 2 3 4 1 2 3 4 1
b 1 1 1 1 2 2 2 2 0
Since you already specify the times of repetition, i.e., twice, of all values 1:4
, you can create a vector sort(rep(vec,2))
where values all have two occurrences and sorted in an ascending manner
apply(df, 1, sort)
sorts rows in a ascending manner as well and apply(df, 1, sort) == sort(rep(vec, 2))
checks the rows are matched with your objective sort(rep(vec, 2))
If all values of a row are matched, you will get a column of all TRUE
s, and then colSums(...) == length(df)
return TRUE
for that row.
Upvotes: 0
Reputation: 76402
Something like this?
vec <- 1:4
apply(df, 1, function(x){
y <- table(factor(x, levels = vec))
+all(y == 2 & vec %in% names(y))
})
#a b
#1 0
And assign this result to the new column.
df$judge <- apply(df, 1, function(x){
y <- table(factor(x, levels = vec))
+all(y == 2 & vec %in% names(y))
})
#df
# V1 V2 V3 V4 V5 V6 V7 V8 judge
#a 1 2 3 4 1 2 3 4 1
#b 1 1 1 1 2 2 2 2 0
Upvotes: 0
Reputation: 388982
One way using apply
:
values_to_check <- 1:4
df$judge <- apply(df, 1, function(x) {
#count frequency for each unique value
tab <- table(x)
#Keep only the values present in values_to_check
tab <- tab[names(tab) %in% values_to_check]
#Check if all the values in values_to_check to are present
#and all those values occur exactly two times
as.integer(all(values_to_check %in% names(tab)) & all(tab == 2))
})
df
# V1 V2 V3 V4 V5 V6 V7 V8 judge
#a 1 2 3 4 1 2 3 4 1
#b 1 1 1 1 2 2 2 2 0
Upvotes: 1