shamp00
shamp00

Reputation: 11326

Efficient way to unindent lines of code stored in a string

I have a string[] which contains code. Each line contains some leading spaces. I need to 'unindent' the code as much as possible without changing the existing formatting.

For instance the contents of my string[] might be

                                         public class MyClass
                                         {
                                             private bool MyMethod(string s)
                                             {
                                                 return s == "";
                                             }
                                         }

I'd like to find a reasonably elegant and efficient method (LINQ?) to transform it to

public class MyClass
{
    private bool MyMethod(string s)
    {
        return s == "";
    }
}

To be clear I'm looking for

IEnumerable<string> UnindentAsMuchAsPossible(string[] content)
{
    return ???;
}

Upvotes: 5

Views: 659

Answers (6)

Timothy Shields
Timothy Shields

Reputation: 79461

Building on Tim Schmelter's answer:

static IEnumerable<string> UnindentAsMuchAsPossible(IEnumerable<string> lines, int tabWidth = 4)
{
    if (!lines.Any())
    {
        return Enumerable.Empty<string>();
    }

    var minDistance = lines
        .Where(line => line.Length > 0)
        .Min(line => line
            .TakeWhile(Char.IsWhiteSpace)
            .Sum(c => c == '\t' ? tabWidth : 1));
    var spaces = new string(' ', tabWidth);
    return lines
        .Select(line => line.Replace("\t", spaces))
        .Select(line => line.Substring(Math.Min(line.Length, minDistance)));
}

This handles:

  • tab characters
  • source code that contains empty lines

Upvotes: 4

JDB
JDB

Reputation: 25810

Use a little LINQ and Regex to find the shortest indentation, then remove that number of characters from all lines.

string[] l_lines = { 
                        "                                         public class MyClass",
                        "                                         {",
                        "                                             private bool MyMethod(string s)",
                        "                                             {",
                        "                                                 return s == \"\";",
                        "                                             }",
                        "                                         }"  
                   };

int l_smallestIndentation =
    l_lines.Min( s => Regex.Match( s, "^\\s*" ).Value.Length );

string[] l_result =
    l_lines.Select( s => s.Substring( l_smallestIndentation ) )
           .ToArray();

foreach ( string l_line in l_result )
    Console.WriteLine( l_line );

Prints:

public class MyClass
{
    private bool MyMethod(string s)
    {
        return s == "";
    }
}

This program will scan all strings in the array. If you can assume that the first line is the least indented, then you could improve performance by scanning only the first line:

int l_smallestIndentation =
    Regex.Match( l_lines[0], "^\\s*" ).Value.Length;

Also note that this will handle a tab character ("\t") as a single character. If there is a mix of tabs and spaces, then reversing the indent may be tricky. The easiest way to handle that would be to replace all instances of tabs with the appropriate number of spaces (often 4, though individual applications can vary wildly) before running the code above.

It would also be possible to modify the code above to give additional weight to tabs. At that point, the regex is no longer of much use.

string[] l_lines = { 
        "\t\t\tpublic class MyClass",
        "                        {",
        "                                private bool MyMethod(string s)",
        "                                {",
        "        \t        \t\treturn s == \"\";",
        "                                }",
        "\t\t\t}"  
    };

int l_tabWeight = 8;
int l_smallestIndentation =
    l_lines.Min
    (
        s => s.ToCharArray()
              .TakeWhile( c => Char.IsWhiteSpace( c ) )
              .Select( c => c == '\t' ? l_tabWeight : 1 )
              .Sum()
    );

string[] l_result =
    l_lines.Select
    (
        s =>
        {
            int l_whitespaceToRemove = l_smallestIndentation;
            while ( l_whitespaceToRemove > 0 )
            {
                l_whitespaceToRemove -= s[0] == '\t' ? l_tabWeight : 1;
                s = s.Substring( 1 );
            }
            return s;
        }
    ).ToArray();

Prints (assuming your console window has a tab width of 8 like mine):

public class MyClass
{
        private bool MyMethod(string s)
        {
                return s == "";
        }
}

You may need to modify this code to work with edge-case scenarios, such as zero-length lines or lines containing only whitespaces.

Upvotes: 2

Matt Houser
Matt Houser

Reputation: 36073

To match your desired method interface:

IEnumerable<string> UnindentAsMuchAsPossible(string[] content)
{
  int minIndent = content.Select(s => s.TakeWhile(c => c == ' ').Count()).Min();
  return content.Select(s => s.Substring(minIndent)).AsEnumerable();
}

This gets the minimum indent of all lines (assumes spaces only, no tabs), then strips minIndent spaces from the start of each line and returns that as IEnumerable.

Upvotes: 1

Tim Schmelter
Tim Schmelter

Reputation: 460108

This should work:

static IEnumerable<string> UnindentAsMuchAsPossible(IEnumerable<string> input)
{
    int minDistance = input.Min(l => l.TakeWhile(Char.IsWhiteSpace).Count());
    return input.Select(l => l.Substring(minDistance));
}

It moves the code to the left, all lines with the same number of spaces.

For example:

string testString = @"       
                     public class MyClass
                     {
                         private bool MyMethod(string s)
                         {
                             return s == "";
                         }
                     }";


string[] lines = testString.Split(new[] { Environment.NewLine }, StringSplitOptions.None);
string[] unindentedArray = UnindentAsMuchAsPossible(lines).ToArray();

Upvotes: 3

Servy
Servy

Reputation: 203819

Just count the number of leading spaces on the first line, and then "remove" that many characters from the start of each line:

IEnumerable<string> UnindentAsMuchAsPossible(string[] content)
{
    int spacesOnFirstLine = content[0].TakeWhile(c => c == ' ').Count();
    return content.Select(line => line.Substring(spacesOnFirstLine));
}

Upvotes: 3

Anders Abel
Anders Abel

Reputation: 69260

This will first find the minimum ident and then remove that number of spaces for each line.

var code = new [] { "  foo", "   bar" };

var minIndent = code.Select(line => line.TakeWhile(ch => ch == ' ').Count()).Min();
var formatted = code.Select(line => line.Remove(0, minIndent));

It would be possible to write everything in one single expression, but while it is more functionally elegant I think that the minIndent variable makes the code more readable.

Upvotes: 1

Related Questions