How can I change certain cells in a certain column in a CSV file with C # when a certain condition is triggered?

Question

So in my program the user can choose a file with an OpenFileDialog and if he wants to save the file with a SaveFileDialog, the columns and rows of the csv file should change. For this I have already tried this

SaveFileDialog:

 List liste = new List();
        // Build the header row into the output:
        liste.Add(String.Join(',', Enum.GetNames()));
        CultureInfo ci = new CultureInfo("de-DE"); // neccessary only when running the code from other cultures.

        SaveFileDialog dialog = new SaveFileDialog();
        dialog.Filter = "CVS (*.cvs)|*.csv|All files (*.*)|*.*";

        if (dialog.ShowDialog() == true)
        {
            string line;

            // Read the file and display it line by line.  
            try
            {
                System.IO.StreamReader file = new System.IO.StreamReader(path);
                while ((line = file.ReadLine()) != null)
                {
                    var cellArray = Regex.Split(line, @"[	,](?=(?:[^\""]|\""[^\""]*\"")*$)")
                                            .Select(s => Regex.Replace(s.Replace("\"\"", "\""), "^\"|\"$", "")).ToArray();

                    // Check value of Betrag, only operate on it if there is a decimal value there
                    if (decimal.TryParse(cellArray[(int)CsvColumn.Betrag], NumberStyles.Any, ci, out decimal betrag))
                    {
                        if (betrag >= 0)
                        {
                            cellArray[(int)CsvColumn.Soll] = "42590";
                            cellArray[(int)CsvColumn.Haben] = "441206";
                        }
                        else
                        {
                            cellArray[(int)CsvColumn.Soll] = "441206";
                            cellArray[(int)CsvColumn.Haben] = "42590";
                        }

                        // Assuming we only write to the purple field when the green field was a decimal:
                        cellArray[(int)CsvColumn.Belegnummer] = "a dummy text";
                    }

                    // Make sure you escape the columns that have a comma
                    liste.Add(string.Join(",", cellArray.Select(x => x.Contains(',') ? $"\"{x}\"" : x)) + "
");
                }

                File.WriteAllLines(dialog.FileName, liste);
                file.Close();
            }
            catch
            {
                MessageBox.Show("Der Gewählte Prozess wird bereits von einem anderen verwendet,
 " +
                    " bitte versuchen sie es erneut");
            }
        }

With that I change the header now, but now I want that when you look at the picture here:

Operations I want to perform:

when the green field is in the positive area, 42590 should be in the blue field and 441206 in the orange field.
if the green value is negative then 441206 should be in the blue field and 42590 in the orange.
In the purple field, a dummy text should simply be automatically written.

So how do I use my C # code to fill in the fields that I have marked in the code?

EDIT

An Example from my Input CSV File in Text Format:

Datum;Wertstellung;Kategorie;Name;Verwendungszweck;Konto;Bank;Betrag;Währung 31.10.2019;01.11.2019;;Some Text;;;;-42,89;EUR 31.10.2019;01.11.2019;;Some Text;;;;-236,98;EUR 31.10.2019;31.10.2019;;Some Text;;;;-200;EUR 30.10.2019;29.10.2019;;Some Text;;;;204,1;EUR 30.10.2019;31.10.2019;;Some Text;;;;-646,98;EUR

Chris Schaller · Accepted Answer

The task itself is pretty simple, but your attempt shows many external influences and almost no documentation. This leads to many comments to your post, but a best-practise answer really needs to address many of the smaller elements that you have so far overlooked. You already have the file management sorted out, so I'll try to focus on the array logic.

Make sure you have run and debugged your code before posting, the output from the initial post has a few quirks:
Your input file uses a semi-colon, so you need to split the line by ONLY THAT CHARACTER in your regular expression:
var cellArray = Regex.Split(line, @"[;](?=(?:[^\""]|\""[^\""]*\"")*$)")
       .Select(s => Regex.Replace(s.Replace("\"\"", "\""), "^\"|\"$", "")).ToArray();
You can't assume to split the string by multiple delimiters at the same time because only value that contain the file specific delimiter will be quote escaped.
This line is doing nothing, it looks like a previous attempt, .Split() and .ToArray() return new values, they do not manipulate the source value, as you are not using the result of this line of code just remove it:
//line.Split(new char[] { '	' }).ToArray();
The header row is being written into the first cell of the first row, while it may look like it works, I challenge you to explain the intent. You have also used a semicolon as the delimiter, even though the rest of your output is using comma, so this is fixed too. You will also find it far simpler to write this header row first, before we even read the input file:
List liste = new List();
// Build the header row into the output:
liste.Add("Belegdatum,Buchungsdatum,Belegnummer,Buchungstext,Verwendungszweck,Soll,Haben,Betrag,Währung");
With the german decimal separator being a comma, you will also need to escape the Betrag decimal value in the output
liste.Add(string.Join(",", cellArray.Select(x => x.Contains(',') ? $"\"{x}\"" : x)) + "
");
Alternatively, you could use a semi-colon like your input data however it is still good practise to test for and escape the values that might contain the delimiter character.
Do you really want the additional line break in the output?

It is not necessary to append each line with the " " line feed character because you are later using WriteAllLines(). This method accepts an array of lines and will inject the line break between each line for you. In file processing like this it is only necessary to manually include the line feed if you were storing the output as a single string variable and perhaps later using WriteAllText() to write the final output to file.

This is often not clear when referencing different guidance material on text file manipulations, be aware of this if you copy one technique from an article that maintains an array of the lines, and a separate example that uses a single string variable or StringBuilder or StringWriter approaches.

The line from above now becomes this, note the trailing has been removed:
liste.Add(string.Join(",", cellArray.Select(x => x.Contains(',') ? $"\"{x}\"" : x)));

tldr; - Show me the codez!

A Simple forward processing approach

It makes for light-weight code but complex logic can be much harder to read, however as you parse each line into the array, you can simply manipulate the values based on your rules. We can refer to this as sequential, in-line or forward processing because we read the input, process and prepare the output one row at a time.

 List liste = new List();
 // Build the header row into the output:
 liste.Add("Belegdatum,Buchungsdatum,Belegnummer,Buchungstext,Verwendungszweck,Soll,Haben,Betrag,Währung");
 CultureInfo ci = new CultureInfo("de-DE"); // necessary only when running the code from other cultures.

 SaveFileDialog dialog = new SaveFileDialog();
 dialog.Filter = "CVS (*.cvs)|*.csv|All files (*.*)|*.*";

 if (dialog.ShowDialog() == true)
 {
     string line;

     // Read the file and display it line by line.  
     try
     {
         System.IO.StreamReader file = new System.IO.StreamReader(path);
         int counter = 0;
         while ((line = file.ReadLine()) != null)
         {
             counter++;
             var cellArray = Regex.Split(line, @"[;](?=(?:[^\""]|\""[^\""]*\"")*$)")
                                     .Select(s => Regex.Replace(s.Replace("\"\"", "\""), "^\"|\"$", "")).ToArray();

             // Skip lines that fail for any reason
             try
             {
                 // Check value of Betrag, only operate on it if there is a decimal value there
                 if (decimal.TryParse(cellArray[7], NumberStyles.Any, ci, out decimal betrag))
                 {
                     if (betrag >= 0)
                     {
                         cellArray[5] = "42590";
                         cellArray[6] = "441206";
                     }
                     else
                     {
                         cellArray[5] = "441206";
                         cellArray[6] = "42590";
                     }

                     // Assuming we only write to the purple field when the green field was a decimal:
                     cellArray[2] = "a dummy text";
                 }
                 else
                 {
                     // Skip lines where the Betrag is not a decimal
                     // this will cover the case when or if the first line is the header.                        
                     continue;

                 }
             }
             catch(Exception ex)
             {
                 // Construct a message box to help the user resolve the issue.
                 // You can use the MessageBox API to allow the user to cancel the process if you want to extend this.
                 // or remove the message altogether if you want it to silently skip the erroneous rows.
                 MessageBox.Show("Fehler beim Analysieren der Eingabezeile,
" +
                     $"{ex.Message}

 " +
                     $"{counter}:> {line} 
 " +
                     $"{new String(' ', counter.ToString().Length)} - {cellArray.Length} Cells
 " +
                     $"|{String.Join("|", cellArray)}|
 " + 
                     "
 " +
                     " Zeile wird verworfen, weiter!");

                 continue; // advance to the next iteration of the while loop.
             }

             // Make sure you escape the columns that have a comma
             liste.Add(string.Join(",", cellArray.Select(x => x.Contains(',') ? $"\"{x}\"" : x)));
         }

         File.WriteAllLines(dialog.FileName, liste);
         file.Close();
     }
     catch
     {
         MessageBox.Show("Der Gewählte Prozess wird bereits von einem anderen verwendet,
 " +
             " bitte versuchen sie es erneut");
     }
 }

Use Named Constants

If you are trying to avoid an OO approach, then it can make the code easier to read by introducing some constants to refer to the indexes, there are many variations to this, but making the code more human readable will assist in future maintenance and understanding of the code.

Define the constants, I recommend doing this inside a static class definition to group these values together, rather than just defining them a local or instance variables.

An enum is another way to do this if you simply need to map strings to ints, or just want to give integer values a name.
```
public enum CsvColumn
{
    Belegdatum = 0,
    Buchungsdatum = 1,
    Belegnummer = 2,
    Buchungstext = 3,
    Verwendungszweck = 4,
    Soll = 5,
    Haben = 6,
    Betrag = 7,
    Währung = 8
}
```
Enums have the added benefit of simple commands to retrive all the names of the columns, now we can use this to build the header line AND as the index references in the code:
```
List liste = new List();
// Build the header row into the output:
liste.Add(String.Join(',', Enum.GetNames()));
```
In previous versions of .Net the generic overload for Enum functions were not defined, in that case you will need to cast the type of the enum:
```
liste.Add(String.Join(',', Enum.GetNames(typeof(CsvColumn))));
```
https://learn.microsoft.com/en-us/dotnet/api/system.enum.getnames?view=netframework-4.7.2
In the following logic using the enum references we need to explicitly cast the enum values to int. If you were using int constants instead, then the (int) explicit cast is not required. Either way now we can immediately understand the intent of the logic, wihtout having to remember what the columns at index 5 and 6 are supposed to mean.
```
 if (decimal.TryParse(cellArray[(int)CsvColumn.Betrag], NumberStyles.Any, ci, out decimal betrag))
 {
     if (betrag >= 0)
     {
         cellArray[(int)CsvColumn.Soll] = "42590";
         cellArray[(int)CsvColumn.Haben] = "441206";
     }
     else
     {
         cellArray[(int)CsvColumn.Soll] = "441206";
         cellArray[(int)CsvColumn.Haben] = "42590";
     }

     // Assuming we only write to the purple field when the green field was a decimal:
     cellArray[(int)CsvColumn.Belegnummer] = "a dummy text";
 }
```
View a fiddle of this implementation: https://dotnetfiddle.net/Cd10Cd

Of course a similar technique could be used for the "42590" and "441206" values, these must have some sort of business relevance/significance. So store them again as constant named string variables.
- What you have here I refer to as Magic Strings, they have no meaning and could easily be corrupted during code refactoring, if the discrete value has a specific business meaning, then it should also have a specific name in the code.

OO Approach

Using an Object-Oriented approach can mean a lot of things, in this context there are 3 different concerns or processes that we want to separate, Parsing the input, executing business logic, Formatting the output. You could simply make 3 methods that accept an array of strings, but this code becomes hard to understand, by using a structured object to model our business domain concept of a row from the CSV file we can remove many auumptions, for instance, which element in the array is the Betrag (Value).

View the OO Fiddle here: https://dotnetfiddle.net/tjxcQN

You could use this Object-Oriented concept in the above code directly, parsing each line into the object, processing and serializing back to a string value all in one code block, however that makes it hard to gain a higer level view of the process which is necessary to understand the code itself. Even if you do this in your head, when we look at our peer's code, we break it down into blocks or discrete steps. So to be a good coding citizen, modularise your logic into functional methods where you can, it will assist you in the future when you need to write unit tests and it will help to keep your code clean, but also to allow us to extend your code in the future.

For this example we will create a simple model to represent each line. Note that this example takes the extra step of parsing the date fields into DateTime properties even though you do not need them for this example. I am deliberately using constants instead of an enum to show you a different concept. You use what ever makes sense on the day, this is still a first principals approach, there are different libraries you can use to manage serialization to and from CSV, XML, JSON and other text formats.

If your business needs are to display this information in an application, rather than just reading a file and writing directly back out to another file, then this information may be helpful to you, otherwise it is a good habit to get into if you are just practising because larger applications or larger teams will require this level of modularisation, which itself is not specifically an OO concept... The OO part comes from where we define the processing logic, in this example the BankRecord contains the logic to parse the CSV string input and how to serialize back to a CSV output.

 public class BankRecord
 {
     ///  Receipt Date 
     public DateTime Belegdatum { get; set; }
     ///  Entry Date 
     public DateTime Buchungsdatum { get; set; }
     /// Sequence number
     public string Belegnummer { get; set; }
     /// Memo - Description
     public string Buchungstext { get; set; }
     /// Purpose
     public string Verwendungszweck { get; set; }
     /// Debit
     public string Soll { get; set; }
     /// Credit
     public string Haben { get; set; }
     /// Amount
     public decimal Betrag { get; set; }
     /// Currency
     public string Währung { get; set; }

     ///  Column Index Definitions to simplify the CSV parsing
     public static class Columns
     {
         public const int Belegdatum = 0;
         public const int Buchungsdatum = 1;
         public const int Belegnummer = 2;
         public const int Buchungstext = 3;
         public const int Verwendungszweck = 4;
         public const int Soll = 5;
         public const int Haben = 6;
         public const int Betrag = 7;
         public const int Währung = 8;

         /// 
         /// Construct a CSV Header row from these column definitions
         /// 
         public static string BuildCsvHeader()
         {
             return String.Join(',',
                 nameof(Belegdatum),
                 nameof(Buchungsdatum),
                 nameof(Belegnummer),
                 nameof(Buchungstext),
                 nameof(Verwendungszweck),
                 nameof(Soll),
                 nameof(Haben),
                 nameof(Betrag),
                 nameof(Währung));
         }

     }


     /// 
     /// Parse a CSV string using the  definitions as the index for each of the named properties in this class
     /// 
     /// The CSV Line to parse
     /// An object that supplies culture-specific formatting information.
     /// BankRecord populated from the input string
     public static BankRecord FromCSV(string csvLine, IFormatProvider provider)
     {
         var cellArray = Regex.Split(csvLine, @"[	,](?=(?:[^\""]|\""[^\""]*\"")*$)")
                 .Select(s => Regex.Replace(s.Replace("\"\"", "\""), "^\"|\"$", "")).ToArray();

         // TODO: add in some validation, today we'll just check the number of cells.
         if (cellArray.Length != 9)
             throw new NotSupportedException("Input CSV did not contain the expected number of columns. (Expected 9)");

         // The following is redimentary and doesn't perform any active error checking, the good news is that when it fails you 
         // will atleast know that it was in this specific method. Proper production level error handling is out of scope for this issue.
         var transaction = new BankRecord();
         transaction.Belegdatum = DateTime.Parse(cellArray[Columns.Belegdatum], provider);
         transaction.Buchungsdatum = DateTime.Parse(cellArray[Columns.Buchungsdatum], provider);
         transaction.Belegnummer = cellArray[Columns.Belegnummer];
         transaction.Buchungstext = cellArray[Columns.Buchungstext];
         transaction.Verwendungszweck = cellArray[Columns.Verwendungszweck];
         transaction.Soll = cellArray[Columns.Soll];
         transaction.Haben = cellArray[Columns.Haben];
         transaction.Betrag = Decimal.Parse(cellArray[Columns.Betrag], provider);
         transaction.Währung = cellArray[Columns.Währung];

         return transaction;
     }

     /// 
     /// Write this object out to a CSV string that can be interpreted using the  definitions as the index for each of the named properties in this class
     /// 
     /// An object that supplies culture-specific formatting information.
     /// CSV string that represents this record./returns>
     public string ToCSV(IFormatProvider provider)
     {
         return String.Join(',',
             CsvEscape(Belegdatum, provider),
             CsvEscape(Buchungsdatum, provider),
             CsvEscape(Belegnummer, provider),
             CsvEscape(Buchungstext, provider),
             CsvEscape(Verwendungszweck, provider),
             CsvEscape(Soll, provider),
             CsvEscape(Haben, provider),
             CsvEscape(Betrag, provider),
             CsvEscape(Währung, provider));
     }

     /// 
     /// Simple routine to format a value for CSV output
     /// 
     /// The value to serialize
     /// An object that supplies culture-specific formatting information.
     /// Value escaped and safe for direct inclusion in a CSV output
     private string CsvEscape(object value, IFormatProvider provider)
     {
         if (value == null)
             return string.Empty;
         string stringValue = String.Format(provider, "{0}", value);
         if (stringValue.Contains(','))
             return $"\"{stringValue}\"";
         else
             return stringValue;
     }

     /// 
     /// Format a Date value for CSV output
     /// 
     /// The value to serialize
     /// An object that supplies culture-specific formatting information.
     /// Simple override to allow for common syntax between types, removes the need to the caller to understand the differences
     /// Value escaped and safe for direct inclusion in a CSV output
     private string CsvEscape(DateTime value, IFormatProvider provider)
     {
         string stringValue = String.Format(provider, "{0:d}", value);
         if (stringValue.Contains(','))
             return $"\"{stringValue}\"";
         else
             return stringValue;
     }
 }

The following is the process logic:

 CultureInfo ci = new CultureInfo("de-DE"); // neccessary only when running the code from other cultures.

 // I'll leave this in, but don't call your list, "liste" instead give it some context or meaing, like "records" or "transactions"
 List liste = new List();

 SaveFileDialog dialog = new SaveFileDialog();
 dialog.Filter = "CVS (*.cvs)|*.csv|All files (*.*)|*.*";

 if (dialog.ShowDialog() == true)
 {
     string line;

     // Read the file line by line.  
     try
     {
         #region Parse the input File
         System.IO.StreamReader file = new System.IO.StreamReader(path);
         while ((line = file.ReadLine()) != null)
         {
             try
             {
                 liste.Add(BankRecord.FromCSV(line, ci));
             }
             catch
             {
                 // TODO: re-raise or otherwise handle this error if you want.
                 //       today we will simply ignore erroneous entries and will suppress this error
             }
         }
         #endregion Parse the input File


         #region Evaluate your business rules
         // Evaluate your business rules here, natively in C#, no arrays or indexes, just manipulate the business domain object.
         // assuming that Belegnummer is a sequencing number, not sure if it is from the start of this file or a different context...
         // This just demonstrates a potential reason for NOT encapsulating the processing logic inside the BankRecord class.
         int previousLineNumber = 47; // aribrary start...
         foreach (var transaction in liste)
         {
             // Check value of Betrag, only operate on it if there is a decimal value there
             if (transaction.Betrag >= 0)
             {
                 transaction.Soll = "42590";
                 transaction.Haben = "441206";
             }
             else
             {
                 transaction.Soll = "441206";
                 transaction.Haben = "42590";
             }

             transaction.Belegnummer = $"#{++previousLineNumber}";
         }
         #endregion Evaluate your business rules


         #region Now write to the output
         List outputLines = new List();
         outputLines.Add(BankRecord.Columns.BuildCsvHeader());
         outputLines.AddRange(liste.Select(x => x.ToCSV(ci)));

         File.WriteAllLines(dialog.FileName, outputLines);
         file.Close();
         #endregion Now write to the output

     }
     catch
     {
         MessageBox.Show("Der Gewählte Prozess wird bereits von einem anderen verwendet,
 " +
             " bitte versuchen sie es erneut");
     }
 }

Final Output:

Belegdatum,Buchungsdatum,Belegnummer,Buchungstext,Verwendungszweck,Soll,Haben,Betrag,Währung
31.10.2019,01.11.2019,#48,Some Text,,42590,441206,"50,43",EUR
31.10.2019,01.11.2019,#49,Some Text,,441206,42590,"-239,98",EUR
31.10.2019,31.10.2019,#50,Some Text,,441206,42590,-500,EUR

Belegdatum	Buchungsdatum	Belegnummer	Buchungstext	Soll	Haben	Betrag	Währung
31.10.2019	01.11.2019	#48	Some Text	42590	441206	50,43	EUR
31.10.2019	01.11.2019	#49	Some Text	441206	42590	-239,98	EUR
31.10.2019	31.10.2019	#50	Some Text	441206	42590	-500	EUR

How can I change certain cells in a certain column in a CSV file with C # when a certain condition is triggered?

Answers (1)

tldr; - Show me the codez!

Related Questions