Reputation: 245
with my program the csv makes a new line with every input. Like:
Is there a way to get it all in one line ?
My current code:
static void Main(string[] args)
{
string path = @"C:\Users\burak\Desktop\todo";
StreamWriter write = new StreamWriter(@"C:\Users\burak\Desktop\todo\test.csv");
foreach (var file in Directory.GetFiles(path, "*.pdf", SearchOption.TopDirectoryOnly))
{
StringBuilder text = new StringBuilder();
PdfReader pdfReader = new PdfReader(file);
string currentText ="";
for (int page = 1; page <= pdfReader.NumberOfPages; page++)
{
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
currentText = string.Join(";", currentText.Split(' ', ':', '/'));
currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
// text.Append(currentText);
pdfReader.Close();
}
text.ToString();
write.Write(currentText);
Console.WriteLine(text.ToString());
}
write.Close();
}
What I tried:
to get the spaces to combine it to one line, but that didn't work at all..
Upvotes: 0
Views: 377
Reputation: 2974
To remove all LineBreaks we can replace them with an empty string. To get the new lines of the current System use System.Environment.NewLine
. Now all the PDF text from all pages is on the same line. To now add a line break for each new PDF File we can add a System.Environment.NewLine
at the end of the string and then write the whole PDF to the CSV file.
Example:
static void Main(string[] args) {
// ...
StreamWriter write = new StreamWriter(@"C:\Users\burak\Desktop\todo\test.csv");
// ...
foreach (var file in Directory.GetFiles(path, "*.pdf", SearchOption.TopDirectoryOnly)) {
// ...
for (int page = 1; page <= pdfReader.NumberOfPages; page++) {
// ...
currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
// ...
}
// Replace newLines
currentText = currentText.Replace(System.Environment.NewLine, string.Empty);
// Add newLine to currentText
currentText += System.Environment.NewLine;
write.Write(currentText);
}
write.Close();
}
Upvotes: 0
Reputation: 105
May be there is a CR or LF in the input text. You can try this:
write.Write(currentText.Replace("\r", "").Replace("\n", ""));
Upvotes: 1