Adrian
Adrian

Reputation: 9793

How to re-order strings that are separated by a comma?

patterns <- c("Athens, Greece", "New York, New York, USA", "Georgia,USA", "Southern California,    USA ")

I have a collection of strings in patterns, and I would like to only focus on those that have a single comma. For example, the string New York, New York, USA should be discarded. I tried the following regular expression to find the strings that only have 1 comma but it didn't work.

grep(",{1}", patterns)
> [1] 1 2 3 4

My ultimate goal is to re-order these strings, so that the final output looks something like this: the string after the comma shows up first, comma is removed, and excess spaces are deleted

final_output 
> [1] "Greece Athens"  "USA Georgia"  "USA Southern California"

Upvotes: 1

Views: 136

Answers (5)

Onyambu
Onyambu

Reputation: 79238

in base R use sub as shown below:

sub("([^,]+), *([^,]+)$|.*", "\\2 \\1", trimws(patterns))

[1] "Greece Athens"           " "                      
[3] "USA Georgia"             "USA Southern California"

If you need to drop the empty values:

grep("\\w", sub("([^,]+), *([^,]+)$|.*", "\\2 \\1", trimws(patterns)), value = TRUE)

[1] "Greece Athens"           "USA Georgia"             "USA Southern California"

Upvotes: 1

Rui Barradas
Rui Barradas

Reputation: 76450

Here is a regex:

patterns <- c("Athens, Greece", "New York, New York, USA", 
              "Georgia,USA", "Southern California,    USA ")

grep("^[^,]*,[^,]*$", patterns)
#> [1] 1 3 4

Created on 2022-08-21 by the reprex package (v2.0.1)

Explanation:

  1. ^[^,]* searches any character but a comma, starting at the beginning of the string;
  2. , a literal comma;
  3. [^,]*$ followed by anything but a comma until the end of the string;
  4. combined, the above search one comma only, with no other commas before or after it.

Index by grep's result or use argument value.

grep("^[^,]*,[^,]*$", patterns, value = TRUE)
#> [1] "Athens, Greece"               "Georgia,USA"                 
#> [3] "Southern California,    USA "

Created on 2022-08-21 by the reprex package (v2.0.1)


As for the second goal, here is a way. Once again, base R only.

patterns <- c("Athens, Greece", "New York, New York, USA", 
              "Georgia,USA", "Southern California,    USA ", "This That")

v <- grep("^[^,]*,[^,]*$", patterns, value = TRUE)
sapply(strsplit(v, ","), \(x) paste(trimws(rev(x)), collapse = " "))
#> [1] "Greece Athens"           "USA Georgia"            
#> [3] "USA Southern California"

Created on 2022-08-21 by the reprex package (v2.0.1)

Upvotes: 3

Darren Tsai
Darren Tsai

Reputation: 35574

You could use gregexpr() to see how many commas in the strings.

n.comma <- sapply(gregexpr(',', patterns), \(x) sum(x > 0))
n.comma
# [1] 1 2 1 1

For your second goal:

sub('(.+)\\s*,\\s*(.+)', '\\2 \\1', trimws(patterns)[n.comma == 1])

# [1] "Greece Athens"   "USA Georgia"   "USA Southern California"

Upvotes: 1

benson23
benson23

Reputation: 19097

First find out strings that have exactly one comma and use that to extract relevant strings in patterns. Then capture all characters before AND after the comma into two capture groups (note the brackets ()). Then replace the string with the second capture group \\2 followed by a space , then the first capture group \\1.

library(stringr)

sub("^(\\w.+?),\\s*(\\w.+?)\\s{0,}$", 
    "\\2 \\1", 
    patterns[str_count(patterns, ",") == 1])

[1] "Greece Athens"           "USA Georgia"            
[3] "USA Southern California"

Upvotes: 1

user438383
user438383

Reputation: 6206

Here's a regex free version - if you want to remove those with more than one comma, then this combination of using str_count and str_trim will work:

library(stringr)
res = str_split(patterns[str_count(patterns, ",") < 2], ",", simplify=T)

str_trim(paste(res[,2], res[,1], sep=" "))
[1] "Greece Athens"           "USA Georgia"            
[3] "USA Southern California"

Upvotes: 1

Related Questions