Tomas Greif
Tomas Greif

Reputation: 22663

Remove spaces in words

I do have a lot of strings in my database (PostgreSQL), an example:

with mystrings as (
    select 'H e l l o, how are you'::varchar string union all
    select 'I am fine, t h a n k you'::varchar string union all
    select 'This is s t r a n g e text'::varchar string union all
    select 'With c r a z y space b e t w e e n characters'::varchar string 
)
select * from mystrings

Is there a way how I can remove spaces between characters in words? For my example the result should be:

Hello, how are you
I am fine, thank you
This is strange text
With crazy space between characters

I started with replace, but there are many such words with spaces between characters and I cannot even find them all.

Because it might be difficult to meaningfully concatenate characters, it might be better idea to get just list of concatenation candidates. Using example data, the result should be:

H e l l o
t h a n k
s t r a n g e
c r a z y
b e t w e e n

Such query should find and return all substrings in string when there are at least three individual characters separated by two spaces (and continue until patern [space] individual character occurs):

He l l o how are you --> llo
H e l l o how are you --> Hello
C r a z y space b e t w e e n --> {crazy, between}

Upvotes: 2

Views: 1762

Answers (3)

Akash
Akash

Reputation: 5012

As per your edited question, the below gets all the possible candidates that have a least three individual characters separated by two spaces

SELECT 
    data || ' --> {' || replace_candidates || '}'
FROM(
SELECT 
    data,
    ( SELECT 
            array_to_string( array_agg( data ),',' )  
        FROM (
            SELECT 
                data,
                length( data ) 
            FROM ( 
                SELECT 
                    replace( data, ' ', '' ) AS data 
                FROM 
                    regexp_split_to_table( data, '\S{2,}' ) AS data 
                ) t
            WHERE length( data ) > 2
        ) t ) AS replace_candidates
    FROM
        mystrings
) T
WHERE 
  replace_candidates IS NOT NULL

Working

Start looking at the inner most query first (the one with regexp_split_to_table)

  1. The regexg gets all strings that have 2 characters in a sequence (not separated by a space)
  2. regexp_split_to_table gets the inverse of the match, more on it here
  3. Replace spaces by a empty char and filter records having a length greater than 2

The reaming are array aggregate functions to take care of formatting, as per your requirement, more of this here

Results

H e l l o how are you --> {Hello}
I am fine, t h a n k you --> {thank}
This is s t r a n g e text --> {strange}
With c r a z y space b e t w e e n characters --> {crazy,between}
SOME MORE TEST T E X T --> {TEXT}

SQLFIDDLE

Note: It considers chars which fall in as [space][char][space], but, you can modify it to suit your needs as [space][space][char][space] or [space][char][special_char][space] ...

Hope this helps ;p

Upvotes: 1

Tomas Greif
Tomas Greif

Reputation: 22663

The following finds possible concatenation candidates:

 with mystrings as (
    select 'H e l l o, how are you'::varchar string union all
    select 'I am fine, t h a n k you'::varchar string union all
    select 'This is s t r a n g e text'::varchar string union all
    select 'With c r a z y space b e t w e e n characters'::varchar string 
)

, u as (
select string, strpart[rn] as strpart, rn
from  (
   select *, generate_subscripts(strpart, 1) as rn
   from  (
      select string, string_to_array(replace(string,',',''), ' ') as strpart
      from   mystrings
      ) x
   ) y
)

,w as (
select 
   string,strpart,rn, 
   case when length(strpart) = 1 then 1 else 0 end as indchar ,
   case when coalesce(length(lag(strpart) over()),0) <> 1 and length(strpart) = 1 then 1 else 0 end as strstart,
   case when coalesce(length(lead(strpart) over()),0) <> 1 and length(strpart) = 1 then 1 else 0 end as strend   
from u
) 


,x as (
   select 
      string,rn,strpart,indchar,strstart,
      sum(strstart) over (order by string, rn) as strid 
   from w 
   where indchar = 1 and not (strstart = 1 and strend = 1)
    )

select string, array_to_string(array_agg(strpart),'') as candidate from x group by string, strid 

Upvotes: 0

user2164964
user2164964

Reputation:

You can use a resource such as online dictionary if the word exists then you dont have to remove spaces otherwise remove spaces or you can use a table where you have to put all strings that exist and then you have to check with that table.Hope you got my point.

Upvotes: 0

Related Questions