newfinder
newfinder

Reputation: 215

Compare two pairs of columns with different lengths

I have the following two sample data frames of different lengths with the same column names:

 data1=data.frame('name'=c('siva','ramu','giri'), 
            'xx'=c(1,0,3))



 name xx
1 siva  1
2 ramu  0
3 giri  3



data2=data.frame('name'=c('siva','ramya','giri','geetha','pallavi'), 
               'xx'=c(1,2,3,4,5))
    name xx
1    siva  1
2   ramya  2
3    giri  3
4  geetha  4
5 pallavi  5

I want to compare the pair of columns in data1 with the corresponding pair of columns in data2. For example, the 1rst row in data1 is the same with the 1rst row in data2. Hence, for this row it holds TRUE. The same holds for row 3.For the other rows we should get FALSE

I tried

library(arsenal)
comparedf(data1,data2)
Compare Object

Function Call: 
comparedf(x = data1, y = data2)

Shared: 2 non-by variables and 3 observations.
Not shared: 0 variables and 2 observations.

Differences found in 2/2 variables compared.
0 variables compared have non-identical attributes

.

Is that correct? If it is, I can not interpret this output.

Upvotes: 0

Views: 68

Answers (2)

Edward
Edward

Reputation: 19209

If you want to use the comparedf function, you need to summarise the results:

Without a "by" argument data frames are compared row-by-row (as stated in the help page).

summary(comparedf(data1, data2))

Gives (after omitting some irrelevant output)

Table: Summary of data.frames

version   arg      ncol   nrow
--------  ------  -----  -----
x         data1       2      3
y         data2       2      5

Table: Summary of overall comparison

statistic                                                      value
------------------------------------------------------------  ------
Number of by-variables                                             0
Number of non-by variables in common                               2
Number of variables compared                                       2
Number of variables in x but not y                                 0
Number of variables in y but not x                                 0
Number of variables compared with some values unequal              2
Number of variables compared with all values equal                 0
Number of observations in common                                   3
Number of observations in x but not y                              0
Number of observations in y but not x                              2
Number of observations with some compared variables unequal        1
Number of observations with all compared variables equal           2
Number of values unequal                                           2

Table: Observations not shared

version    ..row.names..   observation
--------  --------------  ------------
y                      4             4
y                      5             5

Table: Differences detected by variable

var.x   var.y     n   NAs
------  ------  ---  ----
name    name      1     0
xx      xx        1     0

Table: Differences detected

var.x   var.y    ..row.names..  values.x   values.y    row.x   row.y
------  ------  --------------  ---------  ---------  ------  ------
name    name                 2  ramu       ramya           2       2
xx      xx                   2  0          2               2       2

Upvotes: 1

Matt0706
Matt0706

Reputation: 133

This might not be the quickest way, but it does keep it simple.

We cutoff the longer dataframe because they have to be the same length for the == comparison.

data1=data.frame('name'=c('siva','ramu','giri'), 
             'xx'=c(1,0,3))

data2=data.frame('name'=c('siva','ramya','giri','geetha','pallavi'), 
                 'xx'=c(1,2,3,4,5))


length1 <- nrow(data1)
length2 <- nrow(data2)

# Find the length of the shorter dataframe
shorter_length <- 0
if(length1 > length2) shorter_length <- length2 else shorter_length <- length1

# Cutoff the dataframes to that length
data1_short <- head(data1, shorter_length)
data2_short <- head(data2, shorter_length)

# Create a vector of booleans showing which rows match
row_matches <- apply(data1_short == data2_short, 1, all)


> print(row_matches)
    1     2     3 
 TRUE FALSE  TRUE 

Upvotes: 1

Related Questions