sreeram
sreeram

Reputation: 61

Check which rows with each word in a string are capitalised and space separated

I have a column with string of values as shown below

a=["iam best in the world" "you are awesome" ,"Iam Good"]

and I need to check which rows of each word in string are lower case and separated by space.

I know how to convert those to Upper and space separated but i need to find which rows are lower case & space separated.

I have tried using

grepl("\\b([a-z])\\s([a-z])\\b",aa, perl =  TRUE)

Upvotes: 1

Views: 52

Answers (5)

akrun
akrun

Reputation: 887951

We can use filter

library(dplyr)
a %>%
   filter(tolower(some_col) == some_col)
#   v1              some_col
#1  1 iam best in the world
#2  2       you are awesome

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 389325

We can convert the column to lower-case and compare with actual value. Using @Tim's data

a[tolower(a$some_col) == a$some_col, ]

#  v1              some_col
#1  1 iam best in the world
#2  2       you are awesome

If we also need to check for space, we could add another condition with grepl

a[tolower(a$some_col) == a$some_col & grepl("\\s+", a$some_col), ]

Upvotes: 0

Sotos
Sotos

Reputation: 51612

Another idea is to use stri_trans_totitle from stringi package,

a[!!!stringi::stri_trans_totitle(as.character(a$some_col)) == a$some_col,]

#  v1              some_col
#1  1 iam best in the world
#2  2       you are awesome

Upvotes: 0

Andryas Waurzenczak
Andryas Waurzenczak

Reputation: 469

x <- c("iam best in the word ", "you are awesome", "Iam Good")

Here I did something different, first I separeted by space then I check if is lower case. So, the output is a list for each phrase with only the lower case words split by space.

sapply(strsplit(x, " "), function(x) {
  x[grepl("^[a-z]", x)]
})

Upvotes: 0

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522762

We can try using grepl with the pattern \b[a-z]+(?:\\s+[a-z]+)*\b:

matches = a[grepl("\\b[a-z]+(?:\\s+[a-z]+)*\\b", a$some_col), ]
matches

  v1              some_col
1  1 iam best in the world
2  2       you are awesome

Data:

a <- data.frame(v1=c(1:3),
                some_col=c("iam best in the world", "you are awesome", "Iam Good"))

The regex pattern used matches an all-lowercase word, followed by a space and another all-lowercase word, the latter repeated zero or more times. Note that we place word boundaries around the pattern to ensure that we don't get false flag matches from a word beginning with an uppercase letter.

Upvotes: 2

Related Questions