Reputation: 2119
I'm using postgresql 9.3 and I'm trying to inner join
two tables on common string patterns of a minimum length.
Also I'm a noob to SQL try to be patient if possible.
For example:
TABLE 1
ID DATA
1 '1234,5678,1234,1111'
2 '1111,2222'
3 '4321'
TABLE 2
IDa DATA
1a '1111,2222,1234,5678,4321'
2a '1111,3837,2222'
3a '4321'
joining DATA
column on strings matching more than 9 chars would yield:
IDa ID DATA
1a 2 '1111,2222'
1a 1 '1234,5678'
I had some success using LIKE
but I can't force a minimum match length condition(or at least I don't know how). I'm assuming a regex is the solution here but I haven't been able to write one that accomplished what I'm looking for.
Upvotes: 0
Views: 420
Reputation: 656714
Your examples match on 2 x 4 characters, not more than 9 chars
.
I suggest using array types (int[]
) instead of character types, in combination with the handy intersection operator &
from the additional module intarray. More details:
- Error when creating unaccent extension on PostgreSQL
- Postgresql intarray error: undefined symbol: pfree
Query could look like this
SELECT t2.ida, t1.id, t1.data & t2.data AS intersecting_data
FROM tbl1 t1
JOIN tbl2 t2 ON array_length(t1.data & t2.data, 1) = 2; -- or "> 1" ?
Not very efficient, this kind of cross join does not scale well.
Faster alternative: a normalized schema with 1 row per data item. Then the operation boils down to a relational-division.
tbl1_data
tbl1_id item
1 1234
1 5678
1 1234
1 1111
2 1111
...
tbl2_data
tbl1_id item
1a 1111
1a 2222
...
Then the query could be:
SELECT tbl1_id, tbl2_id, array_agg(item) AS data
FROM tbl1_data d1
JOIN tbl2_data d2 USING (item)
GROUP BY 1,2
HAVING count(*) = 2; -- or "> 1" ?
Upvotes: 1