supercontra
supercontra

Reputation: 175

regextract in google sheets to find twitter usernames

I have a spreadsheet of tweets and want to isolate username mentions in Google Sheets. Somehow, regexps that work in R or other languages are not doing the job there.

An example:

RT @Neromoto: @cazainfractor inconsciente agresiva y poco ciudadana conductora

Desired output:

@Neromoto
@cazainfractor

I have tried this: REGEXEXTRACT(B1,(^|[^@\w])@(\w{1,15})\b).

Upvotes: 1

Views: 354

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627335

First of all, your (^|[^@\w])@(\w{1,15})\b regex pattern must be put inside a string literal, i.e. double quotes. Then note that every capturing group will be output, you may want to make the first group non-capturing by replacing ( with (?:. Also, the last \b is redundant, after the last \w matched, there will be either the end of string, or the non-word char.

I'd rather suggest

=REGEXEXTRACT(B1,"\B@\w{1,15}")

enter image description here

Or

=REGEXREPLACE(B1,"(\B@\w{1,15})\s*|.","$1 ")

enter image description here

Details:

  • \B - a non-word boundary (that is, before @, there can be either start of string or a non-word char)
  • @ - a @ char
  • \w{1,15} - 1 to 15 word chars (if you do not care about the length, replace {1,15} with +)

And the second regex details:

  • (\B@\w{1,15})\s* - Group 1 capturing @ at the non-word boundary position, 1 to 15 word chars and then 0+ whitespaces (in the replacement, the $1 backreference inserts the found mentions back into the resulting string)
  • | - or
  • . - any 1 char.

Upvotes: 2

Related Questions