Reputation: 156
I have the following data (in one line):
<span id="ctb_0" onclick="show_hide_box(this);"
class="hide_icon r txtfont ltr">open</span>,
<div class="description clr ltr txtfont">…</div>,
<font class="txtfont ltr">Rayyan Real Investment</font>,
<span class="ltr txtfont">+92-3212459990</span>,
<div class="description clr ltr txtfont">…</div>,
<font class="txtfont ltr">Bukhari Properties</font>,
<span class="ltr txtfont">+92-3218248858</span>,
<div class="description clr ltr txtfont">…</div>,
<font class="txtfont ltr">Exact Properties</font>,
<span class="ltr txtfont">+92-3312044421</span>,
<div class="description clr ltr txtfont">…</div>,
<font class="txtfont ltr">Exact Properties</font>,
<span class="ltr txtfont">+92-3312044421</span>,
<div class="description clr ltr txtfont">…</div>,
<font class="txtfont ltr">Adeel Corporation</font>,
<span class="ltr txtfont">+923008253132</span>
<div class="description clr ltr txtfont">…</div>,
<font class="txtfont ltr">Adeel Corporation</font>,
<span class="ltr txtfont">+92-3008253132</span>,
<div class="description clr ltr txtfont">…</div>,
<font class="txtfont ltr">Z.S Associates</font>,
<span class="ltr txtfont">+92-3452431417</span>,
<div class="description clr ltr txtfont">…</div>,
<font class="txtfont ltr">Keystone Properties</font>,
<span class="ltr txtfont">+92-3353509187/301..</span>,
<div class="description clr ltr txtfont">…</div>,
<font class="txtfont ltr">Adeel Corporation</font>,
<span class="ltr txtfont">+92-3008253132</span>,
<div class="description clr ltr txtfont">…</div>,
<font class="txtfont ltr">Adeel Corporation</font>,
<span class="ltr txtfont">+92-3008253132</span>,
<div class="description clr ltr txtfont">…</div>,
<font class="txtfont ltr">Safeway Real Estate Consultant</font>,
<span class="ltr txtfont">+92-3218282885/345..</span>,
<div class="description clr ltr txtfont">…</div>,
<font class="txtfont ltr">Abdul Sattar & Sons</font>,
<span class="ltr txtfont">+92-3332107802, +9..</span>,
<div class="description clr ltr txtfont">…</div>,
<font class="txtfont ltr">Bismillah Real Estate</font>,
<span class="ltr txtfont">+92-3213336525, 03..</span>,
<div class="description clr ltr txtfont">…</div>,
<font class="txtfont ltr">Aiman Estate & Properties</font>,
<span class="ltr txtfont">+92-3212537535</span>,
<div class="description clr ltr txtfont">…</div>,
<font class="txtfont ltr">Aiman Estate & Properties</font>,
<span class="ltr txtfont">+92-3212537535</span>,
And using regex in notepad++ I want this to be like:
923008929845
923318874928
923008275080
923452113010
923002024486
923218286664
923218286664
923212804245
923002555091
923212804245
923008289996
923003579717
923003579717
923003772227
923007048836
I have tried following in notepad++ but its not clean and quick. I am removing HTML code manually which is preventing me to complete my data scraping quickly
Find what: [a-z]|[A-Z]|[,.()_=;"+<>/:-]
Replace with: (Spacebar)
And still seeing lot's of random characters
Upvotes: 1
Views: 1879
Reputation: 430
Try this .
Find what: \s.*\s.*?(\d+)-(\d{10})|.+
Replace with: $1$2
Note!!"
this is what I have learnt so far from regex, I'm not good at
Regex, but the above regex worked correctly, except 2 spaces left in between of digits....
Upvotes: 0
Reputation: 11216
I don't have notepad++ but something like this will get you most of the way there. It matches everything up until the end of the first occurrence of the number pattern you are seeking. And replaces that entire match with the number patterns that were captured and a line feed. A replace all should do it multiple times.
Upvotes: -1
Reputation: 91518
How about:
Find what: ^.*?\+(\d\d)-(\d{10}).*?$
Replace with: $1$2\n
Explanation:
^ : begining of line
.*? : 0 or more any character (not greedy)
\+ : +, needs to be escaped because it's a special char for regex
(\d\d) : 2 digits captured in group 1
- : dash
(\d{10}) : 10 digits captured in group 2
.*? : 0 or more any character (not greedy)
$ : end of line
Upvotes: 3