Reputation: 9793
patterns <- c("Athens, Greece", "New York, New York, USA", "Georgia,USA", "Southern California, USA ")
I have a collection of strings in patterns
, and I would like to only focus on those that have a single comma. For example, the string New York, New York, USA
should be discarded. I tried the following regular expression to find the strings that only have 1 comma but it didn't work.
grep(",{1}", patterns)
> [1] 1 2 3 4
My ultimate goal is to re-order these strings, so that the final output looks something like this: the string after the comma shows up first, comma is removed, and excess spaces are deleted
final_output
> [1] "Greece Athens" "USA Georgia" "USA Southern California"
Upvotes: 1
Views: 136
Reputation: 79238
in base R use sub
as shown below:
sub("([^,]+), *([^,]+)$|.*", "\\2 \\1", trimws(patterns))
[1] "Greece Athens" " "
[3] "USA Georgia" "USA Southern California"
If you need to drop the empty values:
grep("\\w", sub("([^,]+), *([^,]+)$|.*", "\\2 \\1", trimws(patterns)), value = TRUE)
[1] "Greece Athens" "USA Georgia" "USA Southern California"
Upvotes: 1
Reputation: 76450
Here is a regex:
patterns <- c("Athens, Greece", "New York, New York, USA",
"Georgia,USA", "Southern California, USA ")
grep("^[^,]*,[^,]*$", patterns)
#> [1] 1 3 4
Created on 2022-08-21 by the reprex package (v2.0.1)
Explanation:
^[^,]*
searches any character but a comma, starting at the beginning of the string;,
a literal comma;[^,]*$
followed by anything but a comma until the end of the string;Index by grep
's result or use argument value
.
grep("^[^,]*,[^,]*$", patterns, value = TRUE)
#> [1] "Athens, Greece" "Georgia,USA"
#> [3] "Southern California, USA "
Created on 2022-08-21 by the reprex package (v2.0.1)
As for the second goal, here is a way. Once again, base R only.
patterns <- c("Athens, Greece", "New York, New York, USA",
"Georgia,USA", "Southern California, USA ", "This That")
v <- grep("^[^,]*,[^,]*$", patterns, value = TRUE)
sapply(strsplit(v, ","), \(x) paste(trimws(rev(x)), collapse = " "))
#> [1] "Greece Athens" "USA Georgia"
#> [3] "USA Southern California"
Created on 2022-08-21 by the reprex package (v2.0.1)
Upvotes: 3
Reputation: 35574
You could use gregexpr()
to see how many commas in the strings.
n.comma <- sapply(gregexpr(',', patterns), \(x) sum(x > 0))
n.comma
# [1] 1 2 1 1
For your second goal:
sub('(.+)\\s*,\\s*(.+)', '\\2 \\1', trimws(patterns)[n.comma == 1])
# [1] "Greece Athens" "USA Georgia" "USA Southern California"
Upvotes: 1
Reputation: 19097
First find out strings that have exactly one comma and use that to extract relevant strings in patterns
. Then capture all characters before AND after the comma into two capture groups (note the brackets ()
). Then replace the string with the second capture group \\2
followed by a space
, then the first capture group \\1
.
library(stringr)
sub("^(\\w.+?),\\s*(\\w.+?)\\s{0,}$",
"\\2 \\1",
patterns[str_count(patterns, ",") == 1])
[1] "Greece Athens" "USA Georgia"
[3] "USA Southern California"
Upvotes: 1
Reputation: 6206
Here's a regex free version - if you want to remove those with more than one comma, then this combination of using str_count
and str_trim
will work:
library(stringr)
res = str_split(patterns[str_count(patterns, ",") < 2], ",", simplify=T)
str_trim(paste(res[,2], res[,1], sep=" "))
[1] "Greece Athens" "USA Georgia"
[3] "USA Southern California"
Upvotes: 1