Reputation: 866

Compare Two Columns Without Case Sensitivity

The table below has two columns A and B which I want to compare. If A value doesn't match with value in B, then I keep the unique ID binding these two so track miss matches.

However problem with this approach is by default R is case sensitive. Is there a possible way that this particular code I can ignore case sensitivity?

Input Data

data <- read.table(header = TRUE, text = "A ID  B
                   mA   100 MA
                   ab   101 ab
                   Ca   102 Ca
                   KaK  103 KAK")

A   ID  B
mA  100 MA
ab  101 ab
Ca  102 Ca
KaK 103 KAK

Code To Compare

output <- as.data.frame(data$ID[as.character(data$A) != as.character(data$B)])

Output

ID
100
103

Without case sensitivity the output will be empty data frame as all will match.

Upvotes: 1

Answers (3)

Tony Ladson

Reputation: 3639

Two other approaches

library(tidyverse)
library(stringr)

my_data <- tribble(~A, ~ID, ~B,
                   'mA',   100, 'MA',
                   'ab',   101, 'ab',
                   'Ca',   102, 'Ca',
                   'KaK',  103, 'KAK',
                   'AA',   104, 'BB',
                   'cd',   105, 'cd',
                   'aa',   106, 'bb')

# returns a vector of IDs
my_data$ID[str_detect(my_data$A, regex(my_data$B, ignore_case = TRUE))]

#[1] 100 101 102 103 105

# Processing and returning a tibble
my_data %>% 
  filter(str_detect(A, regex(B, ignore_case = TRUE))) %>% 
  select(ID)

## A tibble: 5 x 1
#     ID
#  <dbl>
# 1   100
# 2   101
# 3   102
# 4   103
# 5   105

Upvotes: 1

user11863180

Reputation:

Sorry! I cannot comment but there are a couple of ways. Use grep and ignore.case=TRUE or maybe wrap within a toupper() or tolower statement.

Ok, got a laptop:

dat<-as.data.frame(dat)

dat[]<-lapply(dat,toupper)

#Add ! to return the opposite
> data.frame(ID=dat$ID[dat$A %in% dat$B])
   ID
1 100
2 101
3 102
4 103

Upvotes: 1

Shree

Reputation: 11140

Here's one way by changing the case of both columns to either upper (toupper) or lower (tolower). Also note the correct way to subset below. You'd also need to add drop = FALSE when subsetting a single column to keep dataframe structure. -

data[tolower(data$A) != tolower(data$B), "ID", drop = FALSE]

[1] ID
<0 rows> (or 0-length row.names)

Upvotes: 3

Compare Two Columns Without Case Sensitivity

Answers (3)

Related Questions