Reputation: 1824
Let's assume I have some string like that:
x = 'Wish she could have told me herself. @NicoleScherzy #nicolescherzinger #OneLove #myfav #MyQueen :heavy_black_heart::heavy_black_heart: some string too :smiling_face:'
So, I want to get from that :
:heavy_black_heart:
:smiling_face:
To do that I did the following :
import re
result = re.search(':(.*?):', x)
result.group()
It only gives me the ':heavy_black_heart:'
. How could I make it work ? If possible I want to store them in dictonary after I found all of them.
Upvotes: 2
Views: 114
Reputation: 54223
Just for fun, here's a simple solution without regex. It splits around ':'
and keeps the elements with odd index:
>>> text = 'Wish she could have told me herself. @NicoleScherzy #nicolescherzinger #OneLove #myfav #MyQueen :heavy_black_heart::heavy_black_heart: some string too :smiling_face:'
>>> text.split(':')[1::2]
['heavy_black_heart', 'heavy_black_heart', 'smiling_face']
>>> set(text.split(':')[1::2])
set(['heavy_black_heart', 'smiling_face'])
Upvotes: 0
Reputation: 2491
print re.findall(':.*?:', x)
is doing the job.
Output:
[':heavy_black_heart:', ':heavy_black_heart:', ':smiling_face:']
But if you want to remove the duplicates:
Use:
res = re.findall(':.*?:', x)
dictt = {x for x in res}
print list(dictt)
Output:
[':heavy_black_heart:', ':smiling_face:']
Upvotes: 3
Reputation: 626806
You seem to want to match smilies that are some symbols in-between 2 :
s. The .*?
can match 0 symbols, and your regex can match ::
, which I think is not what you would want to get. Besdies, re.search
only returns one - the first - match, and to get multiple matches, you usually use re.findall
or re.finditer
.
I think you need
set(re.findall(r':[^:]+:', x))
or if you only need to match word chars inside :...:
:
set(re.findall(r':\w+:', x))
or - if you want to match any non-whitespace chars in between two :
:
set(re.findall(r':[^\s:]+:', x))
The re.findall
will find all non-overlapping occurrences and set
will remove dupes.
The patterns will match :
, then 1+ chars other than :
([^:]+
) (or 1 or more letters, digits and _
) and again :
.
>>> import re
>>> x = 'Wish she could have told me herself. @NicoleScherzy #nicolescherzinger #OneLove #myfav #MyQueen :heavy_black_heart::heavy_black_heart: some string too :smiling_face:'
>>> print(set(re.findall(r':[^:]+:', x)))
{':smiling_face:', ':heavy_black_heart:'}
>>>
Upvotes: 2
Reputation: 1179
import re
x = 'Wish she could have told me herself. @NicoleScherzy #nicolescherzinger #OneLove #myfav #MyQueen :heavy_black_heart::heavy_black_heart: some string too :smiling_face:'
print set(re.findall(':.*?:', x))
output:
{':heavy_black_heart:', ':smiling_face:'}
Upvotes: 0