Reputation: 75
I have a dataframe below.I want to compare pairs of two columns with another pairs of two columns. Every time The comparison of pairs of column should be based on comparing the entries of columns 1:2 with the entries of column 2:1. And where these two column pair is being matched then i want the frequency count to be added with that pair of column.
z <- c(3,3,2)
y <- c(1,2,3)
x <- data.frame(y,z)
library(plyr)
fr <- count(x[,1:2])
fr
# The matched pair of 1:2 with 2:1
fr[3,1:2] == fr[2,2:1]
My desired output is the dataframe that contains the sum of frequency count of the matched pair.
y z freq
1 1 3 1
2 2 3 2
Upvotes: 1
Views: 631
Reputation: 886948
We can do this with base R
. We transform
the dataset by changing the 'x' column with the minimum value of 'y' and 'z' for each row (using pmin
), 'z' with the maximum value of 'y' and 'z' for each row (using pmax
), create a new column of 'freq' with 1 as value. Then, use xtabs
to get the sum
of the 'freq' by 'x' and 'y' (by default, xtabs
gets the sum
), and convert to data.frame
(as.data.frame
).
as.data.frame(xtabs(freq~., transform(x, y= pmin(y,z),
z= pmax(y,z), freq=1)))
# y z Freq
#1 1 3 1
#2 2 3 2
Or another option would to loop along the rows with apply
using MARGIN=1
, sort
the elements and aggregate
to get the sum
grouped by 'y' and 'z'
x[] <- t(apply(x, 1, sort))
aggregate(Freq~., transform(x, Freq=1), sum)
# y z Freq
#1 1 3 1
#2 2 3 2
Upvotes: 2