Reputation: 251
I'm writing a piece of code that takes in an access log and cleans it. Removes all data that i don't need and gives me out a clean version. I've cleaned it from redunadant data but i need to re-format the date which is one of the fields inside of the text file. (Below is the cleaned text file thus far)
I had initially planned on splitting it by '/' and then putting the 3 elements of the date into an array (day, month, year) and re-arranging so that the date was in american format - however this then breaks the file path by '/' and i don't want that.
Below is my code so far, any help or ideas would be much appreciated!
enter code here
StreamReader reader = new StreamReader(fileName);
StreamWriter writer = new StreamWriter(newFileName);
string line;
string personalIdentifier = new string(fileName.Take(4).ToArray());
string gender = fileName.Substring(fileName.Length - 5, 1);
string classification = fileName.Substring(fileName.Length - 8, 2);
string text = string.Empty;
while ((line = reader.ReadLine()) != null)
{
string[] cleanArray;
cleanArray = new string[7];
var result = line.Split('"')
.Select((element, index) => index % 2 == 0
? element.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries)
: new string[] { element })
.SelectMany(element => element).ToList();
cleanArray[0] = personalIdentifier;
cleanArray[1] = gender;
cleanArray[2] = classification;
cleanArray[3] = result[0];
cleanArray[4] = result[3];
cleanArray[5] = result[5];
cleanArray[6] = result[6];
//removing the [ at the start of the date
cleanArray[4] = cleanArray[4].Substring(1);
//re-formatting the date so that it can be accepted by machine learning
var dateString = cleanArray[4];
var date = DateTime.ParseExact(dateString, "dd/MMM/yyyy:HH:mm:ss", CultureInfo.InvariantCulture);
var newDateString = date.ToString("yyyy-MM-dd HH:mm:ss");
//push each clean array onto the file that has been automatically created at the top
writer.WriteLine(string.Join(", ", cleanArray.Select(v => v.ToString())));
writer.WriteLine();
}
reader.DiscardBufferedData();
writer.Close();
reader.Close();
}
Upvotes: 1
Views: 940
Reputation: 6155
You can parse your String
as DateTime
and then parse the DateTime
as String
in the format you want. Something like this.
var dateString = "29/Oct/2014:13:36:07";
var date = DateTime.ParseExact(dateString, "dd/MMM/yyyy:HH:mm:ss", CultureInfo.InvariantCulture);
var newDateString = date.ToString("yyyy-MM-dd");
and you'll get
2014-10-29
If you need the time as well change the last command to
var newDateString = date.ToString("yyyy-MM-dd HH:mm:ss");
and you'll get
2014-10-29 13:36:07
For more information have a look at the MSDN
Upvotes: 2
Reputation: 9804
Storing data in string fromat (or retrieving it from that) is tricky. You need the following things:
My first advice is to never store or Transmit DateTimes as strings. Failing that:
You propably do know the encoding of the file. And as this is a log, it will propably keep to the old ASCII settings. Encodings become a lot less relevant under char 127. XML tends to take care of this part for you.
Culture Format is a Problem. By default ToString() and Parse() and all thier variants retrieve the culture format from Windows. And just because they have the same language, does not mean they have the same formats for anything. For example UK and US date formats are totally different. So always pick a fixed one and hardcode it. The Automatic Culture Format only works for direct user input, nothing else.
For the timezone, my advice is to always store and retreive as UTC. Otherwise you have to adapt for the Timezone used by the original writer (wich you might not know and might not be consistent) and your own (wich DateTime.ToString() will do for you).
Upvotes: 0