Reputation: 215
I have the following two sample data frames of different lengths with the same column names:
data1=data.frame('name'=c('siva','ramu','giri'),
'xx'=c(1,0,3))
name xx
1 siva 1
2 ramu 0
3 giri 3
data2=data.frame('name'=c('siva','ramya','giri','geetha','pallavi'),
'xx'=c(1,2,3,4,5))
name xx
1 siva 1
2 ramya 2
3 giri 3
4 geetha 4
5 pallavi 5
I want to compare the pair of columns in data1 with the corresponding pair of columns in data2. For example, the 1rst row in data1 is the same with the 1rst row in data2. Hence, for this row it holds TRUE. The same holds for row 3.For the other rows we should get FALSE
I tried
library(arsenal)
comparedf(data1,data2)
Compare Object
Function Call:
comparedf(x = data1, y = data2)
Shared: 2 non-by variables and 3 observations.
Not shared: 0 variables and 2 observations.
Differences found in 2/2 variables compared.
0 variables compared have non-identical attributes
.
Is that correct? If it is, I can not interpret this output.
Upvotes: 0
Views: 68
Reputation: 19209
If you want to use the comparedf
function, you need to summarise the results:
Without a "by" argument data frames are compared row-by-row (as stated in the help page).
summary(comparedf(data1, data2))
Gives (after omitting some irrelevant output)
Table: Summary of data.frames
version arg ncol nrow
-------- ------ ----- -----
x data1 2 3
y data2 2 5
Table: Summary of overall comparison
statistic value
------------------------------------------------------------ ------
Number of by-variables 0
Number of non-by variables in common 2
Number of variables compared 2
Number of variables in x but not y 0
Number of variables in y but not x 0
Number of variables compared with some values unequal 2
Number of variables compared with all values equal 0
Number of observations in common 3
Number of observations in x but not y 0
Number of observations in y but not x 2
Number of observations with some compared variables unequal 1
Number of observations with all compared variables equal 2
Number of values unequal 2
Table: Observations not shared
version ..row.names.. observation
-------- -------------- ------------
y 4 4
y 5 5
Table: Differences detected by variable
var.x var.y n NAs
------ ------ --- ----
name name 1 0
xx xx 1 0
Table: Differences detected
var.x var.y ..row.names.. values.x values.y row.x row.y
------ ------ -------------- --------- --------- ------ ------
name name 2 ramu ramya 2 2
xx xx 2 0 2 2 2
Upvotes: 1
Reputation: 133
This might not be the quickest way, but it does keep it simple.
We cutoff the longer dataframe because they have to be the same length for the == comparison.
data1=data.frame('name'=c('siva','ramu','giri'),
'xx'=c(1,0,3))
data2=data.frame('name'=c('siva','ramya','giri','geetha','pallavi'),
'xx'=c(1,2,3,4,5))
length1 <- nrow(data1)
length2 <- nrow(data2)
# Find the length of the shorter dataframe
shorter_length <- 0
if(length1 > length2) shorter_length <- length2 else shorter_length <- length1
# Cutoff the dataframes to that length
data1_short <- head(data1, shorter_length)
data2_short <- head(data2, shorter_length)
# Create a vector of booleans showing which rows match
row_matches <- apply(data1_short == data2_short, 1, all)
> print(row_matches)
1 2 3
TRUE FALSE TRUE
Upvotes: 1