Reputation: 316
Test String:
str = "#www #SoulMusic #50_shades_of_Blue # ##WorldWideWeb
#okie_dokkie #fr!ends #!alPacino #wonderfulRide
#good#club #rhônealpes #trèsbon #øypålandet http://example.com/#comment
#moreTags #www nobody #h3y!boy #EMAIL"
This is what I tried:
String.split(str, ~r/\B(#[á-úÁ-Úä-üÄ-Üa-zA-Z0-9_]+)/, trim: true,
include_captures: true)
But it does not exclude the hashtag in the url as well as this is what I receive:
["#www", " ", "#SoulMusic", " ", "#50_shades_of_Blue", " # #", "#WorldWideWeb", " ", "#okie_dokkie", " ", "#fr", "!ends #!alPacino ", "#wonderfulRide", " ", "#good", "#club ", "#rhônealpes", " ", "#trèsbon", " ", "#øypålandet", " http://example.com/", "#comment", " ", "#moreTags", " ", "#www", " nobody ", "#h3y", "!boy ", "#EMAIL"]
What I aim to get:
["#www", "#SoulMusic", "#50_shades_of_Blue", "#WorldWide",
"#okie_dokkie", "#fr", "wonderfulRide", "#good",
"#rhônealpes", "#trèsbon", "#øypålandet", "#moreTags", "#www",
"#h3y", "#EMAIL"]
Any help on this will be appreciated.
Upvotes: 1
Views: 189
Reputation: 222108
If you only need the matches, you're looking for Regex.scan/2
:
iex(1)> str = "#www #SoulMusic #50_shades_of_Blue # ##WorldWideWeb
...(1)> #okie_dokkie #fr!ends #!alPacino #wonderfulRide
...(1)> #good#club #rhônealpes #trèsbon #gøypålandet http://example.com/#comment
...(1)> #moreTags #www nobody #EMAIL"
"#www #SoulMusic #50_shades_of_Blue # ##WorldWideWeb \n #okie_dokkie #fr!ends #!alPacino #wonderfulRide \n #good#club #rhônealpes #trèsbon #gøypålandet http://example.com/#comment \n #moreTags #www nobody #EMAIL"
iex(2)> Regex.scan(~r/\B#[á-úÁ-Úä-üÄ-Üa-zA-Z0-9_]+/, str)
[["#www"], ["#SoulMusic"], ["#50_shades_of_Blue"], ["#WorldWideWeb"],
["#okie_dokkie"], ["#fr"], ["#wonderfulRide"], ["#good"], ["#rhônealpes"],
["#trèsbon"], ["#gøypålandet"], ["#comment"], ["#moreTags"], ["#www"],
["#EMAIL"]]
This will return a list of lists. You can flatten it to get a list of strings using Enum.concat/1
:
iex(3)> Regex.scan(~r/\B#[á-úÁ-Úä-üÄ-Üa-zA-Z0-9_]+/, str) |> Enum.concat
["#www", "#SoulMusic", "#50_shades_of_Blue", "#WorldWideWeb", "#okie_dokkie",
"#fr", "#wonderfulRide", "#good", "#rhônealpes", "#trèsbon",
"#gøypålandet", "#comment", "#moreTags", "#www", "#EMAIL"]
Upvotes: 2