Momchill
Momchill

Reputation: 466

Removing rows in a dataframe where a word is lowercase

I have a dataframe (df) from which I wish to delete every row, where a column (df$a), has as the first word a lowercase word. I suppose this is a solution involving regex, but I have very little experience with them. I've also looked at the lettercase and textclean packages but was unable to find a concrete illustration for me needs. Thank you!

Upvotes: 1

Views: 1072

Answers (2)

prosoitos
prosoitos

Reputation: 7347

library(tidyverse)

Toy example with a mix of upper and lower case values:

df <- tibble(
  a = c("Value1", "value2", "Value3"),
  b = c("value4", "Value5", "value6"),
  c = c("value7", "value8", "value9"),
  d = 1:3
)

df

# A tibble: 3 x 4
  a      b      c          d
  <chr>  <chr>  <chr>  <int>
1 Value1 value4 value7     1
2 value2 Value5 value8     2
3 Value3 value6 value9     3

Code

Base R:

df[!grepl("^[:lower:].*$", df$a), ]

Tidyverse:

df[!str_detect(df$a, "^[:lower:].*$"), ]

Result

# A tibble: 2 x 4
  a      b      c          d
  <chr>  <chr>  <chr>  <int>
1 Value1 value4 value7     1
2 Value3 value6 value9     3

Note that this also works if you have several words per value (since you only care about the first character of the first word, it doesn't matter whether there are word boundaries):

df <- tibble(
  a = c("Word1 and other words", "word2 AND others", "Word3 And Other Words"),
  b = c("word4", "Word5", "word6"),
  c = c("word7", "word8", "word9"),
  d = 1:3
)

df[!grepl("^[:lower:].*$", df$a), ]

# A tibble: 2 x 4
  a                     b     c         d
  <chr>                 <chr> <chr> <int>
1 Word1 and other words word4 word7     1
2 Word3 And Other Words word6 word9     3

Upvotes: 2

akrun
akrun

Reputation: 887213

We can use grepl

df[!grepl("^[a-z]+\\b", df$a),, drop = FALSE]

Upvotes: 2

Related Questions