Asynchronous
Asynchronous

Reputation: 3977

Removing comma from between immediate letters or numbers or anything except quotes?

Is there a way, say a regex even that will remove any commas enclosed in two consecutive quotes and surrounded by letters or numbers?

Not sure what else to do here and this is my last hope before I go looking at CSV Helpers:

I am using Visual Studio SSIS/BI to import text files into a DB. The problem is, SSIS will chock if the file contains data like this:

"Soccer rocks, yes it does"

To remedy this, I used a Replace Method which solved the problem temporarily. I am running this code in Visual Studio BI/SSIS Script task to process the text file to CSV before sending it to the DB.

static void AddComma(string s, TextWriter writer)
{
    foreach (var line in s.Replace(", ", "").Split(new string[] { Environment.NewLine}, StringSplitOptions.None))
    {
        foreach (var t in line)
        {
            writer.Write(t);
        }
        writer.WriteLine();
    }
    writer.Flush();
}

static void Main(string[] args)
{
    TextReader reader = new StreamReader(@"C:\sample\test.txt");
    string a = reader.ReadToEnd();
    reader.Close();

    FileStream aFile = new FileStream(@"C:\sample\test.csv", FileMode.Create);
    AddComma(a, new StreamWriter(aFile));
    aFile.Close();
}

Note: I am replacing comma followed by a single space

Replace(", ", "");

The problem is if the data in the text file looks like this:

"Soccer rocks,yes it does"

The Replace method will not catch it, obviously.

Is there a way, say a regex even that will remove any commas enclosed in two consecutive quotes and surrounded by letters or numbers?

So if the data looks like this: "Soccer rocks, yes it does" Or "Soccer rocks 54,23 yes it does" then it will end up like this: "Soccer rocks yes it does"

I am not sure what is possible and simply looking for some kind of solution.

Upvotes: 1

Views: 808

Answers (1)

Angga
Angga

Reputation: 2323

did you mean something like this?

if yes, you should use matcher with patern regex ("[\w\s]*),([\w\s]*"), and get the first and second group then you will get what you need.

if you use c#, that's means you use .net engine regex then you can use infinite repetition lookbehind.

You can try something like this then s.Replace("(?<="[\w\s]+),(?=[\w\s]+")","-"), so you can just replace it without needing to get group and match.

Upvotes: 1

Related Questions