Reputation: 27
I am working in postgresql. I have an int column "DOC_ID" and a text column "CONTENT". Each row of the table contains information about the document.
There are sections of text that match the regular expression
'#[A-Za-z]+:[A-Za-z]+#
. They are different!
For example: #Lot : version# , #Participant : name# and others.
I want for each value that satisfies the regular expression to get a column with the ID of the document in which it occurs. Considering that in one document it can be several times.
Example: My table:
DOC_ID CONTENT
1 '#Participant : name# cat bear dog #Participant : name# mouse'
2 'cat #Participant : name# bear'
3 'cat #Lot : version# dog'
10 'mouse #Lot : version# cow'
Result table
TAG DOC_ID
#Participant : name# 1
#Participant : name# 1
#Participant : name# 2
#Lot : version# 3
#Lot : version# 10
Please tell me how can I do this.
Upvotes: 1
Views: 56
Reputation: 121754
Use the function regexp_matches()
:
select
(regexp_matches(content, '#[A-Za-z0-9 №%--,. ]+:[A-Za-z0-9 №%--,. ]+#', 'g'))[1] as tag,
doc_id
from my_table
Read about the function and pattern matching in the documentation.
Test the query in db<>fiddle.
Upvotes: 1