Reputation: 3977
Is there a way, say a regex even that will remove any commas enclosed in two consecutive quotes and surrounded by letters or numbers?
Not sure what else to do here and this is my last hope before I go looking at CSV Helpers:
I am using Visual Studio SSIS/BI to import text files into a DB. The problem is, SSIS will chock if the file contains data like this:
"Soccer rocks, yes it does"
To remedy this, I used a Replace Method which solved the problem temporarily.
I am running this code in Visual Studio BI/SSIS Script task
to process the text file to CSV before sending it to the DB.
static void AddComma(string s, TextWriter writer)
{
foreach (var line in s.Replace(", ", "").Split(new string[] { Environment.NewLine}, StringSplitOptions.None))
{
foreach (var t in line)
{
writer.Write(t);
}
writer.WriteLine();
}
writer.Flush();
}
static void Main(string[] args)
{
TextReader reader = new StreamReader(@"C:\sample\test.txt");
string a = reader.ReadToEnd();
reader.Close();
FileStream aFile = new FileStream(@"C:\sample\test.csv", FileMode.Create);
AddComma(a, new StreamWriter(aFile));
aFile.Close();
}
Note: I am replacing comma followed by a single space
Replace(", ", "");
The problem is if the data in the text file looks like this:
"Soccer rocks,yes it does"
The Replace method will not catch it, obviously.
Is there a way, say a regex even that will remove any commas enclosed in two consecutive quotes and surrounded by letters or numbers?
So if the data looks like this: "Soccer rocks, yes it does" Or "Soccer rocks 54,23 yes it does" then it will end up like this: "Soccer rocks yes it does"
I am not sure what is possible and simply looking for some kind of solution.
Upvotes: 1
Views: 808
Reputation: 2323
did you mean something like this?
if yes, you should use matcher with patern regex ("[\w\s]*),([\w\s]*")
, and get the first and second group then you will get what you need.
if you use c#, that's means you use .net engine regex then you can use infinite repetition lookbehind.
You can try something like this then s.Replace("(?<="[\w\s]+),(?=[\w\s]+")","-")
, so you can just replace it without needing to get group and match.
Upvotes: 1