edgarmtze
edgarmtze

Reputation: 25048

How to align columns of different length in a txt file, based on max length of each column?

Having a txt file with 18 columns delimited by '' and separated by , where each line represents an Insert statement of a sqlite query:

    (1999,1999,1999,1999,1999,0,0,'flaggr.png',261,     'Βάκχειος',             'Spl-up','B ',  'Pagrati/Athens,Attica,Greece',     'N/A',   'Hellenic Mythology',      '','', ''),
    (2000,2000,2000,2000,2000,0,2010,'flagru.png',3340, 'Анклав Снов',              'Act',    'G/D ',   'Bryansk,Russia',       '2008-2009(as Vampire''s Crypt),2010-present',   'N/A',     '','', ''),
    (2001,2001,2001,2001,2001,0,2002,'flagru.png',271,  'Аркона',               'Act','P/FO ',  'Moscow,Russia',        '2002(as Гиперборея),2002-present',  'Slavic Pism and FOtales, Legends, Mythology',     '', '', ''),
    (2002,2002,2002,2002,2002,0,1988,'flagru.png',470,      'Аспид',                'Spl-up','PROG ',   'Volgodonsk,Rostovregion,Russia',       '1988-1997,2010-?',  'Politics, Horror, Death',     '', '', ''),
    (2003,2003,2003,2003,2003,0,2000,'flagua.png',359,  'Ірій',             'Unknown','FO D /G ',   'Lviv,Ukraine',     '2000-?',    'Slavic mythology, Ukrainian FOlore',      '', '', ''),
    (2004,2004,2004,2004,2004,0,2011,'flagru.png',3036579,  'Лесьяр',               'Act','P FO ',  'Moscow,Russia',        '2011-present',  'Pism, FOlore, Social matters, Feelings',      '', '', ''),
    (2005,2005,2005,2005,2005,0,2003,'flagru.png',218,  'М8Л8ТХ',               'Act','B  with RAC',    'Tver,Ukraine(posterior),Russia',       '2003-present',  'National Pride, National Socialism, Hatred, War, Intolerance, Pism',      '', '', ''),
    (2006,2006,2006,2006,2006,0,0,'flagru.png',354037,      'Рельос',               'Act','PR/POST-/ (early), G/POST-, Ambient (later)',    'Baltiisk,Kaliningradregion,Russia',        'N/A',   'N/A',     '', '',''),
    (2007,2007,2007,2007,2007,0,2006,'flagru.png',32937,    'Сивый Яр',             'Act','P/POST-B ',  'Vyritsa,Leningradregion,Russia',       '2006-present',  'Pism, Pride, Heritage, Poetry, Slavonic Mythology',       '', '', ''),
    (2008,2008,2008,2008,2008,0,2001,'flagru.png',44,       'Темнозорь',                'Act','FO/B ',  'Moscow,Russia',        '2001-present',  'Nature, Slavonic Pism, War, Right-wing nationalism',      '4394', '', ''),
    (2009,2009,2009,2009,2009,0,1993,'flagru.png',80,       'Эпидемия',             'Act','Pow ',   'Moscow,Russia',        '1993-present',  'Fantasy, Tolkien, Elves',     '', '', ''),
    (2010,2010,2010,2010,2010,0,0,'flagjp.png',354039,      'こくまろみるく',              'Act','G/Pow ', 'N/A,Japan',        'N/A',   'Bizarre, Macabre',        '', '', ''),
    (2011,2011,2011,2011,2011,0,2012,'flagus.png',38723,    'מזמור',                'Act','B/Drone/D ', 'Portland,Oregon,United States',        '2012-present',  'N/A',     '', '', ''),
    (2012,2012,2012,2012,2012,0,2004,'flaglb.png',67,   'دمار',             'Spl-up','B/Death ',    'Hamra,Beirut,Lebanon',     '2004-2006',     'War, Pride, Blasphemy, Supremacy',        '', '', ''),
    (2013,2013,2013,2013,2013,0,2006,'flagcn.png',760,  '原罪',               'Act','B  (early), G/B  (later)',   'Chengdu,SichuanProvince,China',        '2006-present',  'Misanthropy, Hatred, Depression, War, Revelation',        '', '', ''),
    (2014,2014,2014,2014,2014,0,1995,'flagtw.png',443,      '閃靈',               'Act','Melodic B/Death/FO ',    'Taipei,Taiwan',        '1995-present',  'Taiwanese Myths and Legends, Anti-Fascism, History',      '4443', '', ''),
    (2015,2015,2015,2015,2015,0,2001,'flagjp.png',31450,    '電気式華憐音楽集団',                'Act','Pow/G',  'N/A,Japan',        '2001-present',  'Anime, Fantasy, Liberty',     '', '', '');

What would be the best way to align all columns so for instance the first two rows become:

(1999,1999,1999,1999,1999,0,0,   'flaggr.png',261,  'Βάκχειος',     'Spl-up',   'B ',   'Pagrati/Athens,Attica,Greece', 'N/A',                                          'Hellenic Mythology',   '','', ''),
(2000,2000,2000,2000,2000,0,2010,'flagru.png',3340, 'Анклав Снов',  'Act',      'G/D ', 'Bryansk,Russia',               '2008-2009(as Vampire''s Crypt),2010-present',  'N/A',                  '','', ''),

I was thinking on:

  1. Split all lines strings in file using comma as separator
  2. Compute each column max length and store it in memory
  3. Loop again the file but this time use computed max length and write output

The code I came was something like the following, however I realized one issue, there are some columns that have comma inside single quote like 'bla1,bla2,bla3' (columns 12 to 18 could have inner commas...) so if I split string using comma, I will not get 18 columns.

After that problem I do not know how to continue... What would be the way to split by comma, but considering single quote of some strings?

    private static void AdjustColumnsInFile(string filePath, string outputFile)
    {
        //array to store max size of each column
        int[] sizes = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
        foreach (var line in File.ReadLines(filePath))
        {
            var words = line.Split(',');
            if (words.Length == 18)
            {
                var i = 0;
                //get max value of each column
                foreach (var word in words)
                {
                    sizes[i] = sizes[i] < word.Length ? word.Length : sizes[i];
                    i++;
                }
            }
        }

        ...

        using (var sw = new StreamWriter(outputFile))
        {
            foreach (var l in newLines)
            {
                sw.WriteLine($"{l}");
            }
        }
    }

Upvotes: 3

Views: 325

Answers (1)

Evk
Evk

Reputation: 101493

As I understand, your only problem is how to split string on commas given that some commas might appear inside '' quotes. You can do that with regular expression:

,(?=(?:[^\']*\'[^\']*\')*[^\']*$)

It basically matches comma which is followed by zero or even number of quotes ('). If comma appears inside '' quotes - in a valid string it will be followed by odd number of quotes, so will not match.

The rest should be easy, first calculate sizes:

//array to store max size of each column
int[] sizes = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
foreach (var line in File.ReadLines(filePath)) {
    var tmp = line.Trim(); // remove leading and trailing whitespace
    tmp = tmp.Remove(tmp.Length - 2, 2); // remove closing ) and , or ;
    tmp = tmp.Remove(0, 1); // remove opening (   
    // split by comma                 
    var words = Regex.Split(tmp, @",(?=(?:[^\']*\'[^\']*\')*[^\']*$)");
    if (words.Length == 18) {
        for (int i = 0; i < words.Length; i++) {
            var word = words[i].Trim(); // remove whitespace
            sizes[i] = sizes[i] < word.Length ? word.Length : sizes[i];
        }
    }
    else throw new Exception("Invalid number of columns");
}

Then repeat and append spaces to columns which do not match expected size:

using (var writer = new StreamWriter(outputFile)) {
    foreach (var line in File.ReadLines(filePath)) {                    
        var tmp = line.Trim(); // remove trailing whitespace
        bool hadTrailingComma = tmp.EndsWith(",");
        tmp = tmp.Remove(tmp.Length - 2, 2); // remove closing ) and , or ;
        tmp = tmp.Remove(0, 1); // remove opening (                                                            
        var words = Regex.Split(tmp, @",(?=(?:[^\']*\'[^\']*\')*[^\']*$)");
        var newLine = String.Join(",", words.Select((w, i) =>
        {
            w = w.Trim();
            var targetSize = sizes[i];
            if (w.Length < targetSize)
                return w + new string(' ', targetSize - w.Length); // append spaces until max length
            return w;
        }));

        writer.WriteLine($"({newLine}){(hadTrailingComma ? "," : ";")}");
    }
}

Note that because of unicode characters such as こくまろみるく your output file might appear not aligned correctly, while in reality it is (that is - each column has the same size in characters).

Upvotes: 2

Related Questions