Niuq Navig
Niuq Navig

Reputation: 81

Code Treats .txt File Differently When Saved

I have an input .txt file that looks something like this.

command1 param1
command2       param2
command3       param3
command4 param4

I am trying to reduce the extra whitespace so I implemented the code below to remove that.

string[] output = File.ReadAllText(InputFilePath).Split('\n').Select(s => Regex.Replace(s, @"\s+", " ")).ToArray();

File.WriteAllLines(OutputFilePath, output);

If I run the code on the file without doing anything, the code does not work.

However, If I manually go into the input file and just save it without changing anything and then run the code again, it works fine.

I believe this is some sort of UTF-16/8 issue but I am not sure how to account for it. What can I do?

Upvotes: 1

Views: 58

Answers (1)

Niuq Navig
Niuq Navig

Reputation: 81

In this specific case there were "invisible control characters and unused code points". Using regular expressions to remove those characters resolved the issue.

string[] output = File.ReadAllLines(InputFilePath).Select(s => Regex.Replace(s, @"\p{C}+", "")).ToArray();

Upvotes: 1

Related Questions