Eric
Eric

Reputation: 1034

I can't Split the text extracted from powerpoint into multiple lines

I extracted some text within a shape the text, i printed it line by line into an output txt file for viewing before i actually do what i need to do.

the problem i'm having is that the text i'm extracting when opened with notepad++ i can see there is a the text split into multiple lines while in regular notepad it's one big chunk of text. Is there a way for me to detect that nextline for me to split the string?

Here's my code

int linecounter = 1;
bool isDetailPage = false;
Application pptApplication = new Application();
Presentation pptPresentation = pptApplication.Presentations.Open(file, MsoTriState.msoFalse, MsoTriState.msoFalse, MsoTriState.msoFalse);
foreach (Slide _slide in pptPresentation.Slides) {
  tempOutput.Add("- Parsing Slide " + linecounter);
  foreach (Microsoft.Office.Interop.PowerPoint.Shape _shape in _slide.Shapes) {
    if(_shape.HasTextFrame == MsoTriState.msoTrue) {
      var textFrame = _shape.TextFrame;
      if(textFrame.HasText == MsoTriState.msoTrue) {
        var textRange = textFrame.TextRange;
        Match match = knowldgeSlide.Match(textRange.Text.ToString());
        if (match.Success) {
          isDetailPage = true;
        }
        if(isDetailPage) { //ignore other slides
          string[] lines = textRange.Text.ToString().Split(
            new[] { "\n" },
            StringSplitOptions.None
          );
          int t = 0;
          foreach(string x in lines) {
            tempOutput.Add("line " + t + ": " + x);
            t++;
          }
        }
      }
    }
  }
  isDetailPage = false;
  linecounter++;
}

Here's the extracted text from the powerpoint, which i want to split into 5 lines of strings.

line 0: Identify the four benefits you gain from convergence and OTN? (Source: Identify the need for the NCS 4000 Series in the OTN Environment) 
Virtualized network operations
The scalability 
Reduction in transport costs
Flexibility allows operators to employ the technologies
Service contracts

Upvotes: 0

Views: 290

Answers (2)

Nick Garyu
Nick Garyu

Reputation: 506

Sometimes "\r" is used as a new line in addition to "\n". If the text is showing up in notepad++ with line breaks, then there is definitely something there that notepad++ is picking up on. You can see the character values for each character by clicking View > Show Symbols > Show all characters. When you look at it like this in notepad++, find whatever is at the end of each line and split based on that char in your C# code.

Upvotes: 1

Idle_Mind
Idle_Mind

Reputation: 39122

Split on both \r and \n.

I like to do it this way:

string[] lines = textRange.Text.ToString().Split("\r\n".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);

Upvotes: 1

Related Questions