Burak
Burak

Reputation: 245

How to get PDF output in one CSV line

with my program the csv makes a new line with every input. Like:

Is there a way to get it all in one line ?

My current code:

static void Main(string[] args)
{
    string path = @"C:\Users\burak\Desktop\todo";
    StreamWriter write = new StreamWriter(@"C:\Users\burak\Desktop\todo\test.csv");
    foreach (var file in Directory.GetFiles(path, "*.pdf", SearchOption.TopDirectoryOnly))
    {
        StringBuilder text = new StringBuilder();
        PdfReader pdfReader = new PdfReader(file);
        string currentText ="";

        for (int page = 1; page <= pdfReader.NumberOfPages; page++)
        {
            ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
            currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
            currentText = string.Join(";", currentText.Split(' ', ':', '/'));
            currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
            // text.Append(currentText);
            pdfReader.Close();
        }
        
        text.ToString();
        write.Write(currentText);
        Console.WriteLine(text.ToString());
    }
    write.Close();
}

What I tried:

to get the spaces to combine it to one line, but that didn't work at all..

Upvotes: 0

Views: 377

Answers (2)

IndieGameDev
IndieGameDev

Reputation: 2974

To remove all LineBreaks we can replace them with an empty string. To get the new lines of the current System use System.Environment.NewLine. Now all the PDF text from all pages is on the same line. To now add a line break for each new PDF File we can add a System.Environment.NewLine at the end of the string and then write the whole PDF to the CSV file.

Example:

static void Main(string[] args) {
    // ...
    StreamWriter write = new StreamWriter(@"C:\Users\burak\Desktop\todo\test.csv");
    // ...

    foreach (var file in Directory.GetFiles(path, "*.pdf", SearchOption.TopDirectoryOnly)) {
        // ...

        for (int page = 1; page <= pdfReader.NumberOfPages; page++) {
            // ...
            currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
            // ...
        }

        // Replace newLines
        currentText = currentText.Replace(System.Environment.NewLine, string.Empty);
        // Add newLine to currentText
        currentText += System.Environment.NewLine;
        write.Write(currentText);
    }
    write.Close();
}

Upvotes: 0

Xavi Anguera
Xavi Anguera

Reputation: 105

May be there is a CR or LF in the input text. You can try this:

write.Write(currentText.Replace("\r", "").Replace("\n", ""));

Upvotes: 1

Related Questions