Kupalzky
Kupalzky

Reputation: 69

How to match every beginning of new line of text in Regex?

I'm in the process of creating a CSV file out of a text file. Very new to Regex and I need to finish the CSV file.

What I need to do is to remove every new line of text and put them in one single line.

For example, this data:

ABC Company INC
123 Some Street 
Winchester, KY

Needed to be in this format:

ABC Company INC;123 Some Street;Winchester, KY

Plus, on my file... it has several entries with one line-break every after one company.

It's like this:

ABC Company
123 Street
Winchester, KY

DEF Company
456 Street
Winchester, KY

And make it like so:

ABC Company;123 Street;Winchester, KY
DEF Company;456 Street;Winchester, KY

Can we do that in Regex? If so, then how?

More Info:

This is not for programming or coding related issue.

It's more of data conversion or manipulation. I'm only using a text editor. I need to edit the text file (mined data) and convert it to a CSV file.

If there are other tools that we might use for this, then please mention about it.

UPDATE:

With this particular problem at hand, with my current level of knowledge, I found the answer of Bohemian more helpful in my case. It did help me well with the task.

However, the answer provided by Sobrique is more powerful to use. Only I don't know how to use it well. What I did with the Pearl script is... I copied the whole printed output of the script since I don't know how to output it to a file. Plus, I also encountered some inaccurate data. It's a great tool, only I couldn't handle it right now.

Upvotes: 0

Views: 127

Answers (2)

Bohemian
Bohemian

Reputation: 425053

Do a replace like this:

 Search: (?<=.)$(\s(?!^$))+^
Replace: ;

then, to remove the blank lines:

 Search: ^$\s+
Replace: <nothing>

Those look arounds are there to make sure that blank lines (of zero length) are not matched.

Upvotes: 1

Sobrique
Sobrique

Reputation: 53478

Regular expressions aren't really the tool for this job. They're about pattern matching.

You might find that tr is suitable, as you can transliterate linefeed to ;.

Alternatively in perl:

#!/usr/bin/perl

use strict;
use warnings;

while (<DATA>) {
    chomp;
    print;
    if (m/^\s*$/) {
        print "\n";
    }
    else {
        print ";";
    }
}

__DATA__
ABC Company
123 Street
Winchester, KY

DEF Company
456 Street
Winchester, KY

Will do the trick.

To turn this into a one liner:

perl -e 'while (<>) { chomp; print; if (m/^\s*$/) { print "\n" } else { print ";" } }' yourfile

(perl -i enables 'inplace editing' - this will just print it)

Upvotes: 1

Related Questions