Reputation: 22663
I do have a lot of strings in my database (PostgreSQL), an example:
with mystrings as (
select 'H e l l o, how are you'::varchar string union all
select 'I am fine, t h a n k you'::varchar string union all
select 'This is s t r a n g e text'::varchar string union all
select 'With c r a z y space b e t w e e n characters'::varchar string
)
select * from mystrings
Is there a way how I can remove spaces between characters in words? For my example the result should be:
Hello, how are you
I am fine, thank you
This is strange text
With crazy space between characters
I started with replace
, but there are many such words with spaces between characters and I cannot even find them all.
Because it might be difficult to meaningfully concatenate characters, it might be better idea to get just list of concatenation candidates. Using example data, the result should be:
H e l l o
t h a n k
s t r a n g e
c r a z y
b e t w e e n
Such query should find and return all substrings in string when there are at least three individual characters separated by two spaces (and continue until patern [space] individual character
occurs):
He l l o how are you --> llo
H e l l o how are you --> Hello
C r a z y space b e t w e e n --> {crazy, between}
Upvotes: 2
Views: 1762
Reputation: 5012
As per your edited question, the below gets all the possible candidates that have a least three individual characters separated by two spaces
SELECT
data || ' --> {' || replace_candidates || '}'
FROM(
SELECT
data,
( SELECT
array_to_string( array_agg( data ),',' )
FROM (
SELECT
data,
length( data )
FROM (
SELECT
replace( data, ' ', '' ) AS data
FROM
regexp_split_to_table( data, '\S{2,}' ) AS data
) t
WHERE length( data ) > 2
) t ) AS replace_candidates
FROM
mystrings
) T
WHERE
replace_candidates IS NOT NULL
Working
Start looking at the inner most query first (the one with regexp_split_to_table
)
regexg
gets all strings that have 2 characters in a sequence
(not separated
by a space)regexp_split_to_table
gets the inverse of the match, more on it hereempty char
and filter records
having a length greater than 2
The reaming are array aggregate
functions to take care of formatting
, as per your requirement, more of this here
Results
H e l l o how are you --> {Hello}
I am fine, t h a n k you --> {thank}
This is s t r a n g e text --> {strange}
With c r a z y space b e t w e e n characters --> {crazy,between}
SOME MORE TEST T E X T --> {TEXT}
Note: It considers chars which fall in as [space][char][space]
, but, you can modify it to suit your needs as [space][space][char][space]
or [space][char][special_char][space]
...
Hope this helps ;p
Upvotes: 1
Reputation: 22663
The following finds possible concatenation candidates:
with mystrings as (
select 'H e l l o, how are you'::varchar string union all
select 'I am fine, t h a n k you'::varchar string union all
select 'This is s t r a n g e text'::varchar string union all
select 'With c r a z y space b e t w e e n characters'::varchar string
)
, u as (
select string, strpart[rn] as strpart, rn
from (
select *, generate_subscripts(strpart, 1) as rn
from (
select string, string_to_array(replace(string,',',''), ' ') as strpart
from mystrings
) x
) y
)
,w as (
select
string,strpart,rn,
case when length(strpart) = 1 then 1 else 0 end as indchar ,
case when coalesce(length(lag(strpart) over()),0) <> 1 and length(strpart) = 1 then 1 else 0 end as strstart,
case when coalesce(length(lead(strpart) over()),0) <> 1 and length(strpart) = 1 then 1 else 0 end as strend
from u
)
,x as (
select
string,rn,strpart,indchar,strstart,
sum(strstart) over (order by string, rn) as strid
from w
where indchar = 1 and not (strstart = 1 and strend = 1)
)
select string, array_to_string(array_agg(strpart),'') as candidate from x group by string, strid
Upvotes: 0
Reputation:
You can use a resource such as online dictionary if the word exists then you dont have to remove spaces otherwise remove spaces or you can use a table where you have to put all strings that exist and then you have to check with that table.Hope you got my point.
Upvotes: 0