yeahumok
yeahumok

Reputation: 2952

RegEx -- getting rid of double whitespaces?

I have an app that goes in, replaces "invalid" chars (as defined by my Regex) with a blankspace. I want it so that if there are 2 or more blank spaces in the filename, to trim one. For example:

Deal A & B.txt after my app runs, would be renamed to Deal A   B.txt (3 spaces b/w A and B). What i want is really this: Deal A B.txt (one space between A and B).

I'm trying to determine how to do this--i suppose my app will have to run through all filenames at least once to replace invalid chars and then run through filenames again to get rid of extraneous whitespace.

Can anybody help me with this?
Here is my code currently for replacing the invalid chars:

public partial class CleanNames : Form
{
    public CleanNames()
    {
        InitializeComponent();

    }

    public void Sanitizer(List<string> paths)
    {
        string regPattern = (@"[~#&$!%+{}]+");
        string replacement = " ";

        Regex regExPattern = new Regex(regPattern);


        StreamWriter errors = new StreamWriter(@"S:\Testing\Errors.txt", true);
        var filesCount = new Dictionary<string, int>();


        dataGridView1.Rows.Clear();

           try
            {

              foreach (string files2 in paths)
              {

                string filenameOnly = System.IO.Path.GetFileName(files2);
                string pathOnly = System.IO.Path.GetDirectoryName(files2);
                string sanitizedFileName = regExPattern.Replace(filenameOnly, replacement);
                string sanitized = System.IO.Path.Combine(pathOnly, sanitizedFileName);


                if (!System.IO.File.Exists(sanitized))
                {
                    DataGridViewRow clean = new DataGridViewRow();
                    clean.CreateCells(dataGridView1);
                    clean.Cells[0].Value = pathOnly;
                    clean.Cells[1].Value = filenameOnly;
                    clean.Cells[2].Value = sanitizedFileName;
                    dataGridView1.Rows.Add(clean);

                    System.IO.File.Move(files2, sanitized);
                }

                else
                {
                    if (filesCount.ContainsKey(sanitized))
                    {
                        filesCount[sanitized]++;
                    }
                    else
                    {
                        filesCount.Add(sanitized, 1);
                    }
                    string newFileName = String.Format("{0}{1}{2}",
                    System.IO.Path.GetFileNameWithoutExtension(sanitized),
                    filesCount[sanitized].ToString(),
                    System.IO.Path.GetExtension(sanitized));
                    string newFilePath = System.IO.Path.Combine(System.IO.Path.GetDirectoryName(sanitized), newFileName);
                    System.IO.File.Move(files2, newFilePath);
                    sanitized = newFileName;

                    DataGridViewRow clean = new DataGridViewRow();
                    clean.CreateCells(dataGridView1);
                    clean.Cells[0].Value = pathOnly;
                    clean.Cells[1].Value = filenameOnly;
                    clean.Cells[2].Value = newFileName;

                    dataGridView1.Rows.Add(clean);

                }




              }
            }
           catch (Exception e)
           {
               errors.Write(e);
           }


    }

    private void SanitizeFileNames_Load(object sender, EventArgs e)
    { }

    private void dataGridView1_CellContentClick(object sender, DataGridViewCellEventArgs e)
    {

    }

    private void button1_Click(object sender, EventArgs e)
    {
        Application.Exit();
    }


}

The problem is, that not all files after a rename will have the same amount of blankspaces. As in, i could have Deal A&B.txt which after a rename would become Deal A B.txt (1 space b/w A and B--this is fine). But i will also have files that are like: Deal A & B & C.txt which after a rename is: Deal A   B   C.txt (3 spaces between A,B and C--not acceptable).

Does anybody have any ideas/code for how to accomplish this?

Upvotes: 2

Views: 714

Answers (6)

Scott Pedersen
Scott Pedersen

Reputation: 1311

Just add a space to your regPattern. Any collection of invalid characters and spaces will be replaced with a single space. You may waste a little bit of time replacing a space with a space, but on the other hand you won't need a second string manipulation call.

Upvotes: 4

Fosco
Fosco

Reputation: 38526

After you're done sanitizing it your way, simply replace 2 spaces with 1 space, while 2 spaces exist in the string.

while (mystring.Contains("  ")) mystring = mystring.Replace("  "," ");

I think that's the right syntax...

Upvotes: 1

CodingWithSpike
CodingWithSpike

Reputation: 43718

Does this help?

        var regex = new System.Text.RegularExpressions.Regex("\\s{2,}");
        var result = regex.Replace("Some text  with a   lot      of spaces,   and 2\t\ttabs.", " ");
        Console.WriteLine(result);

output is:

Some text with a lot of spaces, and 2 tabs.

It just replaces any sequence of 2 or more whitespace characters with a single space...


Edit:

To clarify, I would just perform this regex right after your existing one:

public void Sanitizer(List<string> paths)
{
    string regPattern = (@"[~#&$!%+{}]+");
    string replacement = " ";

    Regex regExPattern = new Regex(regPattern);
    Regex regExPattern2 = new Regex(@"\s{2,}");

and:

          foreach (string files2 in paths)
          {

            string filenameOnly = System.IO.Path.GetFileName(files2);
            string pathOnly = System.IO.Path.GetDirectoryName(files2);
            string sanitizedFileName = regExPattern.Replace(filenameOnly, replacement);
            sanitizedFileName = regExPattern2.Replace(sanitizedFileName, replacement); // clean up whitespace
            string sanitized = System.IO.Path.Combine(pathOnly, sanitizedFileName);

I hope that makes more sense.

Upvotes: 2

Mau
Mau

Reputation: 14468

As Fosco said, with formatting:

while (mystring.Contains("  ")) mystring = mystring.Replace("  "," ");

//                        ||                                 ||   |

Upvotes: 1

ULysses
ULysses

Reputation: 978

you can perform another regex replace after your first one

@" +" -> " "

Upvotes: 1

JSBձոգչ
JSBձոգչ

Reputation: 41378

Do the local equivalent of:

s/\s+/ /g;

Upvotes: 5

Related Questions