Michael
Michael

Reputation: 251

How to reformat the date WITHIN a text file C#

I'm writing a piece of code that takes in an access log and cleans it. Removes all data that i don't need and gives me out a clean version. I've cleaned it from redunadant data but i need to re-format the date which is one of the fields inside of the text file. (Below is the cleaned text file thus far)

enter image description here

I had initially planned on splitting it by '/' and then putting the 3 elements of the date into an array (day, month, year) and re-arranging so that the date was in american format - however this then breaks the file path by '/' and i don't want that.

Below is my code so far, any help or ideas would be much appreciated!

enter code here

  StreamReader reader = new StreamReader(fileName);
        StreamWriter writer = new StreamWriter(newFileName);
        string line;

        string personalIdentifier = new string(fileName.Take(4).ToArray());
        string gender = fileName.Substring(fileName.Length - 5, 1);
        string classification = fileName.Substring(fileName.Length - 8, 2);
        string text = string.Empty;

        while ((line = reader.ReadLine()) != null)
        {
            string[] cleanArray;
            cleanArray = new string[7];

            var result = line.Split('"')
                 .Select((element, index) => index % 2 == 0
                  ? element.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries)
                  : new string[] { element })
                 .SelectMany(element => element).ToList();

            cleanArray[0] = personalIdentifier;
            cleanArray[1] = gender;
            cleanArray[2] = classification;
            cleanArray[3] = result[0];
            cleanArray[4] = result[3];
            cleanArray[5] = result[5];
            cleanArray[6] = result[6];

            //removing the [ at the start of the date
            cleanArray[4] = cleanArray[4].Substring(1);

            //re-formatting the date so that it can be accepted by machine learning
            var dateString = cleanArray[4];
            var date = DateTime.ParseExact(dateString, "dd/MMM/yyyy:HH:mm:ss", CultureInfo.InvariantCulture);
            var newDateString = date.ToString("yyyy-MM-dd HH:mm:ss");

            //push each clean array onto the file that has been automatically created at the top
            writer.WriteLine(string.Join(", ", cleanArray.Select(v => v.ToString())));
            writer.WriteLine();
        }
        reader.DiscardBufferedData();
        writer.Close();
        reader.Close();    
        }

Upvotes: 1

Views: 940

Answers (2)

Mighty Badaboom
Mighty Badaboom

Reputation: 6155

You can parse your String as DateTime and then parse the DateTime as String in the format you want. Something like this.

var dateString = "29/Oct/2014:13:36:07";
var date = DateTime.ParseExact(dateString, "dd/MMM/yyyy:HH:mm:ss", CultureInfo.InvariantCulture);
var newDateString = date.ToString("yyyy-MM-dd");

and you'll get

2014-10-29

If you need the time as well change the last command to

var newDateString = date.ToString("yyyy-MM-dd HH:mm:ss");

and you'll get

2014-10-29 13:36:07

For more information have a look at the MSDN

Upvotes: 2

Christopher
Christopher

Reputation: 9804

Storing data in string fromat (or retrieving it from that) is tricky. You need the following things:

  • the encoding used to store the string in bytes
  • the culture format used to store the DateTime
  • the timezone the DateTime is applied to.

My first advice is to never store or Transmit DateTimes as strings. Failing that:

You propably do know the encoding of the file. And as this is a log, it will propably keep to the old ASCII settings. Encodings become a lot less relevant under char 127. XML tends to take care of this part for you.

Culture Format is a Problem. By default ToString() and Parse() and all thier variants retrieve the culture format from Windows. And just because they have the same language, does not mean they have the same formats for anything. For example UK and US date formats are totally different. So always pick a fixed one and hardcode it. The Automatic Culture Format only works for direct user input, nothing else.

For the timezone, my advice is to always store and retreive as UTC. Otherwise you have to adapt for the Timezone used by the original writer (wich you might not know and might not be consistent) and your own (wich DateTime.ToString() will do for you).

Upvotes: 0

Related Questions