Reputation: 49
I've been trying to standardize the "Address" field of a table with 400k lines. Problem is, this field was made to be a free form field, meaning that users (upon registering) could just enter anything they wanted into it. I've already managed to split this address field into 14 others with each word of the adress line in order.
Now i need to concatenate some of these fields back together into different fields such as:
I wish it could be as easy as grabbing the first 3 fields and calling that the street name and the 4th field as House number but due to different street name lengths, the field with the "house number" is usually fields #4, #5 and rarely ever #3 or #6.
I've already thought of an approach that could work in this case and that would be concatenating these fields in a loop and using the first occurence of a field where its first digit is a number as a break point.(due to some houses being numbered "10C", "1A", "1B" and so on)
Due to not being not very good at PL/SQL, i don't know how to put this idea into code.
All that i've managed to do so far was writing a function to check if a string starts with a digit or not, so it could be used in an IF function.
How can i dynamically "traverse" the fields in a loop using PL/SQL? Would i use an array? Is it even possible?
EDIT: Examples of what this address field contains (in portuguese):
Avenida Doutor Theomário Pinto da Costa 450 Condominio Renaissance, rua 1, casa 1
Rua Álvaro Peres Filho 60 Casa azul em frente ao orelhão
Travessa Delegado Zé Lima 61 antiga Praça Rio Branco
Rua Finlândia 28 Qd 111
Alameda Áustria 107 Condomínio Jardim Europa I
There is a clear pattern in the data, which leads me to believe that this address data was colected in a segmented format, but then concatenated into a single table field. What i need to do is the reverse, basically.
The pattern goes:
Avenida(1) Doutor Theomário Pinto da Costa (2) 450(3) Condominio Renaissance, rua 1, casa 1(4)
1 = Street type (avenue, street and etc)
2 = Street Name
3 = House Number
4 = Reference (usually used to help with directions, this field is usually open by design. Things like Floor, building and Condo name goes here.)
As you can see from the examples above, the streets usually have widly different name lenghts, which makes setting a specific number of fields as the street name impossible.
Upvotes: 1
Views: 542
Reputation: 571
A posible solution:
Declare
/* Dynamically concatenating strings using a loop in Oracle PL/SQL */
--
type tyArray_parsed
is table of varchar2(2000)
index by binary_integer;
--
arAddress_parsed tyArray_parsed;
--
cnuSTREET_TYPE constant integer:=1;
cnuSTREET_NAME constant integer:=2;
cnuHOUSE_NUMBER constant integer:=3;
cnuREFERENCE constant integer:=4;
--
Procedure print(p_isbTexto varchar2)
is
nuIzq_print integer;
nuDer_print integer;
Begin
nuDer_print:=0;
Loop
nuIzq_print:=nuDer_print+1;
nuDer_print:=instr(p_isbTexto,chr(10),nuIzq_print);
exit when nvl(nuDer_print,0)=0;
dbms_output.put_line(substr(p_isbTexto, nuIzq_print, nuDer_print-nuIzq_print));
End loop;
dbms_output.put_line(substr(p_isbTexto, nuIzq_print));
End print;
--
procedure pAddress_split(isbAddress_string varchar2,
oarAddress_parsed out tyArray_parsed
)
is
type tyArray_string
is table of varchar2(2000)
index by binary_integer;
--
arFields tyArray_string;
nuLeft_pos integer;
nuRight_pos integer;
nuMy_number number;
blIs_number boolean;
nuStep integer;
Begin
print('---------------------------------------');
print('isbAddress_string=['||isbAddress_string||']');
--
nuRight_pos:=0;
Loop
nuLeft_pos:=nuRight_pos+1;
nuRight_pos:=instr(isbAddress_string,' ',nuLeft_pos);
exit when nvl(nuRight_pos,0)=0;
arFields(arFields.COUNT+1):=substr(isbAddress_string,
nuLeft_pos,
nuRight_pos-nuLeft_pos
);
End loop;
arFields(arFields.COUNT+1):=substr(isbAddress_string,
nuLeft_pos
);
print('arFields.COUNT='||arFields.COUNT);
--
For nuI in arFields.FIRST..arFields.LAST loop
print(lpad(nuI,4)||'['||arFields(nui)||']');
End loop;
--
nuStep:=cnuSTREET_TYPE;
For nuI in arFields.FIRST..arFields.LAST loop
if nuStep=cnuSTREET_TYPE then
oarAddress_parsed(cnuSTREET_TYPE):=arFields(nui);
nuStep:=cnuSTREET_NAME;
Elsif nuStep=cnuSTREET_NAME then
blIs_number:=false;
Begin
nuMy_number:=to_number(arFields(nui));
blIs_number:=true;
Exception
when others then
blIs_number:=false;
End;
if blIs_number then
oarAddress_parsed(cnuHOUSE_NUMBER):=arFields(nui);
nuStep:=cnuREFERENCE;
Else
if not(oarAddress_parsed.exists(cnuSTREET_NAME)) then
oarAddress_parsed(cnuSTREET_NAME):=arFields(nui);
Else
oarAddress_parsed(cnuSTREET_NAME):=oarAddress_parsed(cnuSTREET_NAME)||' '||arFields(nui);
End if;
End if;
Elsif nuStep=cnuREFERENCE then
if not(oarAddress_parsed.exists(cnuREFERENCE)) then
oarAddress_parsed(cnuREFERENCE):=arFields(nui);
Else
if arFields(nui)=',' then
oarAddress_parsed(cnuREFERENCE):=oarAddress_parsed(cnuREFERENCE)||arFields(nui);
Else
oarAddress_parsed(cnuREFERENCE):=oarAddress_parsed(cnuREFERENCE)||' '||arFields(nui);
end if;
End if;
End if;
End loop;
--
for nuI in oarAddress_parsed.FIRST..oarAddress_parsed.LAST loop
print(lpad(nuI,2)||'|['||oarAddress_parsed(nui)||']');
End loop;
End pAddress_split;
Begin
dbms_application_info.set_module('Split','START-'||to_char(sysdate,'yyyy-mm-dd hh24:mi:ss'));
--
pAddress_split('Avenida Doutor Theomário Pinto da Costa 450 Condominio Renaissance, rua 1, casa 1',arAddress_parsed);
pAddress_split('Rua Álvaro Peres Filho 60 Casa azul em frente ao orelhão',arAddress_parsed);
pAddress_split('Travessa Delegado Zé Lima 61 antiga Praça Rio Branco',arAddress_parsed);
pAddress_split('Rua Finlândia 28 Qd 111',arAddress_parsed);
pAddress_split('Alameda Áustria 107 Condomínio Jardim Europa I',arAddress_parsed);
--
--
dbms_application_info.set_module('Split','END-'||to_char(sysdate,'yyyy-mm-dd hh24:mi:ss'));
End;
/
Answer:
---------------------------------------
isbAddress_string=[Avenida Doutor Theomário Pinto da Costa 450 Condominio Renaissance, rua 1, casa 1]
arFields.COUNT=13
1[Avenida]
2[Doutor]
3[Theomário]
4[Pinto]
5[da]
6[Costa]
7[450]
8[Condominio]
9[Renaissance,]
10[rua]
11[1,]
12[casa]
13[1]
1|[Avenida]
2|[Doutor Theomário Pinto da Costa]
3|[450]
4|[Condominio Renaissance, rua 1, casa 1]
---------------------------------------
isbAddress_string=[Rua Álvaro Peres Filho 60 Casa azul em frente ao orelhão]
arFields.COUNT=11
1[Rua]
2[Álvaro]
3[Peres]
4[Filho]
5[60]
6[Casa]
7[azul]
8[em]
9[frente]
10[ao]
11[orelhão]
1|[Rua]
2|[Álvaro Peres Filho]
3|[60]
4|[Casa azul em frente ao orelhão]
---------------------------------------
isbAddress_string=[Travessa Delegado Zé Lima 61 antiga Praça Rio Branco]
arFields.COUNT=9
1[Travessa]
2[Delegado]
3[Zé]
4[Lima]
5[61]
6[antiga]
7[Praça]
8[Rio]
9[Branco]
1|[Travessa]
2|[Delegado Zé Lima]
3|[61]
4|[antiga Praça Rio Branco]
---------------------------------------
isbAddress_string=[Rua Finlândia 28 Qd 111]
arFields.COUNT=5
1[Rua]
2[Finlândia]
3[28]
4[Qd]
5[111]
1|[Rua]
2|[Finlândia]
3|[28]
4|[Qd 111]
---------------------------------------
isbAddress_string=[Alameda Áustria 107 Condomínio Jardim Europa I]
arFields.COUNT=7
1[Alameda]
2[Áustria]
3[107]
4[Condomínio]
5[Jardim]
6[Europa]
7[I]
1|[Alameda]
2|[Áustria]
3|[107]
4|[Condomínio Jardim Europa I]
Upvotes: 1