Reputation: 1270

How to check two files in R

I have two files and I want to make sure these two files are the same.

File 1:

df1<-read.table (text=" class   type    colour  gender
12  1   yellow  F
11  1   green   M
14  2   red M
18  2   red F
16  1   red F


", header=TRUE)

File 2:

df2<-read.table (text=" class   type    colour  gender
12  1   yellow  F
11  2   gree    M
14  2   red N
18  2   red F
18  1   red F


", header=TRUE)

As we can see in df2, there are four errors in columns. For example in df2, the class should read 16 as in df1, the class is 16 (last row), not 18. If the values in df1 and df2 are not equal, I want to get a FALSE and then see the number of errors. the outcome is: and the error=4. I have nearly 100 columns and so they are just a small sample of the data

out<-read.table (text=" class   type    colour  gender
12  1   yellow  F
11  FALSE   FALSE   M
14  2   red FALSE
18  2   red F
FALSE   1   red F

", header=TRUE)

Error =4

Upvotes: 3

Answers (5)

ThomasIsCoding

Reputation: 101247

You can try

> ifelse(df1 == df2, as.matrix(df1), FALSE)
     class   type    colour   gender 
[1,] "12"    "1"     "yellow" "F"
[2,] "11"    "FALSE" "FALSE"  "M"
[3,] "14"    "2"     "red"    "FALSE"
[4,] "18"    "2"     "red"    "F"
[5,] "FALSE" "1"     "red"    "F"

If you want to count how many errors

> table(df1 == df2)

FALSE  TRUE
    4    16

> sum(df1 != df2)
[1] 4

Upvotes: 2

nniloc

Reputation: 4243

A tweak to Daniel's answer

chck <- mapply(function(x,y){
  x[x != y] <- 'FALSE'
  x
}, df1, df2)

chck <- data.frame(chck)

#-------
  class  type colour gender
1    12     1 yellow      F
2    11 FALSE  FALSE      M
3    14     2    red  FALSE
4    18     2    red      F
5 FALSE     1    red      F

sum(!chck)
# 4

Upvotes: 2

Jeff Bezos

Reputation: 2253

You can loop over columns with mapply and generate a logical matrix. You just have to make sure your dataframes don't contain any factors:

Create data:

df1<-read.table (text=" class   type    colour  gender
12  1   yellow  F
11  1   green   M
14  2   red M
18  2   red F
16  1   red F
", header=TRUE, as.is = TRUE)

df2<-read.table (text=" class   type    colour  gender
12  1   yellow  F
11  2   gree    M
14  2   red N
18  2   red F
18  1   red F
", header=TRUE, as.is = TRUE)

Compare dataframes:

match <- mapply(`==`, df1, df2)
table(match)
# FALSE  TRUE 
#    4    16

Upvotes: 2

lumartor

Reputation: 719

A simply solution with R base:

text1 <- unname(unlist(lapply(df1, as.character)))
text2 <- unname(unlist(lapply(df2, as.character)))

# number of differences
sum(text1 != text2)

You can make a function:

n_diff <- function(a, b){
  a <- unname(unlist(lapply(a, as.character)))
  b <- unname(unlist(lapply(b, as.character)))
  n <- sum(a != b)
  print(paste0("Error = ", n))
  ifelse(n==0, TRUE, FALSE)
}

and the output:

> n_diff(df1, df2)
[1] "Error = 4"
[1] FALSE
> x<-n_diff(df1, df2)
[1] "Error = 4"
> x
[1] FALSE

Upvotes: 1

itsDV7

Reputation: 854

# install.packages("diffr")
library(diffr)
diffr("file1.R", "file2.R")

This will give you a difference between the file:

library(tools)
Rdiff("file1.R","file2.R", Log = T)
#For number of errors:
all.equal(readLines("file1.R"), readLines("file2.R"))

This is all I can suggest right now.


1c1
< df1<-read.table (text=" class   type    colour  gender
---
> df2<-read.table (text=" class   type    colour  gender
3c3
< 11  1   green   M
---
> 11  2   gree    M
4c4
< 14  2   red M
---
> 14  2   red N
6c6
< 16  1   red F
---
> 18  1   red F
$status
[1] 1

$out
[1] "1c1\n< df1<-read.table (text=\" class   type    colour  gender\n---\n> df2<-read.table (text=\" class   type    colour  gender"
[2] "3c3\n< 11  1   green   M\n---\n> 11  2   gree    M"                                                                            
[3] "4c4\n< 14  2   red M\n---\n> 14  2   red N"                                                                                    
[4] "6c6\n< 16  1   red F\n---\n> 18  1   red F"

Upvotes: 1

How to check two files in R

Answers (5)

Related Questions