Reputation: 1270
I have two files and I want to make sure these two files are the same.
File 1:
df1<-read.table (text=" class type colour gender
12 1 yellow F
11 1 green M
14 2 red M
18 2 red F
16 1 red F
", header=TRUE)
File 2:
df2<-read.table (text=" class type colour gender
12 1 yellow F
11 2 gree M
14 2 red N
18 2 red F
18 1 red F
", header=TRUE)
As we can see in df2, there are four errors in columns. For example in df2, the class should read 16 as in df1, the class is 16 (last row), not 18. If the values in df1 and df2 are not equal, I want to get a FALSE and then see the number of errors. the outcome is: and the error=4. I have nearly 100 columns and so they are just a small sample of the data
out<-read.table (text=" class type colour gender
12 1 yellow F
11 FALSE FALSE M
14 2 red FALSE
18 2 red F
FALSE 1 red F
", header=TRUE)
Error =4
Upvotes: 3
Views: 251
Reputation: 101247
You can try
> ifelse(df1 == df2, as.matrix(df1), FALSE)
class type colour gender
[1,] "12" "1" "yellow" "F"
[2,] "11" "FALSE" "FALSE" "M"
[3,] "14" "2" "red" "FALSE"
[4,] "18" "2" "red" "F"
[5,] "FALSE" "1" "red" "F"
If you want to count how many errors
> table(df1 == df2)
FALSE TRUE
4 16
or
> sum(df1 != df2)
[1] 4
Upvotes: 2
Reputation: 4243
A tweak to Daniel's answer
chck <- mapply(function(x,y){
x[x != y] <- 'FALSE'
x
}, df1, df2)
chck <- data.frame(chck)
#-------
class type colour gender
1 12 1 yellow F
2 11 FALSE FALSE M
3 14 2 red FALSE
4 18 2 red F
5 FALSE 1 red F
sum(!chck)
# 4
Upvotes: 2
Reputation: 2253
You can loop over columns with mapply
and generate a logical matrix. You just have to make sure your dataframes don't contain any factors:
Create data:
df1<-read.table (text=" class type colour gender
12 1 yellow F
11 1 green M
14 2 red M
18 2 red F
16 1 red F
", header=TRUE, as.is = TRUE)
df2<-read.table (text=" class type colour gender
12 1 yellow F
11 2 gree M
14 2 red N
18 2 red F
18 1 red F
", header=TRUE, as.is = TRUE)
Compare dataframes:
match <- mapply(`==`, df1, df2)
table(match)
# FALSE TRUE
# 4 16
Upvotes: 2
Reputation: 719
A simply solution with R base:
text1 <- unname(unlist(lapply(df1, as.character)))
text2 <- unname(unlist(lapply(df2, as.character)))
# number of differences
sum(text1 != text2)
You can make a function:
n_diff <- function(a, b){
a <- unname(unlist(lapply(a, as.character)))
b <- unname(unlist(lapply(b, as.character)))
n <- sum(a != b)
print(paste0("Error = ", n))
ifelse(n==0, TRUE, FALSE)
}
and the output:
> n_diff(df1, df2)
[1] "Error = 4"
[1] FALSE
> x<-n_diff(df1, df2)
[1] "Error = 4"
> x
[1] FALSE
Upvotes: 1
Reputation: 854
# install.packages("diffr")
library(diffr)
diffr("file1.R", "file2.R")
This will give you a difference between the file:
library(tools)
Rdiff("file1.R","file2.R", Log = T)
#For number of errors:
all.equal(readLines("file1.R"), readLines("file2.R"))
This is all I can suggest right now.
1c1
< df1<-read.table (text=" class type colour gender
---
> df2<-read.table (text=" class type colour gender
3c3
< 11 1 green M
---
> 11 2 gree M
4c4
< 14 2 red M
---
> 14 2 red N
6c6
< 16 1 red F
---
> 18 1 red F
$status
[1] 1
$out
[1] "1c1\n< df1<-read.table (text=\" class type colour gender\n---\n> df2<-read.table (text=\" class type colour gender"
[2] "3c3\n< 11 1 green M\n---\n> 11 2 gree M"
[3] "4c4\n< 14 2 red M\n---\n> 14 2 red N"
[4] "6c6\n< 16 1 red F\n---\n> 18 1 red F"
Upvotes: 1