Reputation: 81
I have an input .txt file that looks something like this.
command1 param1
command2 param2
command3 param3
command4 param4
I am trying to reduce the extra whitespace so I implemented the code below to remove that.
string[] output = File.ReadAllText(InputFilePath).Split('\n').Select(s => Regex.Replace(s, @"\s+", " ")).ToArray();
File.WriteAllLines(OutputFilePath, output);
If I run the code on the file without doing anything, the code does not work.
However, If I manually go into the input file and just save it without changing anything and then run the code again, it works fine.
I believe this is some sort of UTF-16/8 issue but I am not sure how to account for it. What can I do?
Upvotes: 1
Views: 58
Reputation: 81
In this specific case there were "invisible control characters and unused code points". Using regular expressions to remove those characters resolved the issue.
string[] output = File.ReadAllLines(InputFilePath).Select(s => Regex.Replace(s, @"\p{C}+", "")).ToArray();
Upvotes: 1