Perry
Perry

Reputation: 11

python3 trying to split string on \x0c

I'm extracting text from a PDF to a string text:

text = "● A justification of your prediction, including the following information that helped form\n\no Angle of the sun relative to the surface on September 22, 2021\no Materials of the surface (include three materials) and heat absorption\n\ncharacteristics\n\no Length of exposure of the surface to the sun (i.e., the amount of time the surface\n\nhas had to warm on that day), including slopes of the stadium and a consideration\nof the angles of the seats\n\n1 Yes, I know that’s a Wednesday but just go with it…\n\n\x0c● Sources: Be sure to include in-text citations as appropriate as well as provide a list of\n\nsources that were used for your report, use MLA or APA citation style\n\n● Your report can assume any format you chose, and should be between 300-400 words in\n\nlength\n\nResources:\n\n"

I want to split this text on "\x0c". I tried re.split(r'[\x0c]+', text) but that simply removes the "\x0c", it does not split. Likewise, text.splitlines() didn't do the trick.

What am I missing?

Upvotes: 1

Views: 474

Answers (2)

vexem
vexem

Reputation: 86

There's probably a cleaner way, but this is would be my method:

splittext = text.split('\x0c')
splittext[0] += '\x0c'

string1 = splittext[0]
string2 = splittext[1]

Upvotes: 0

2e0byo
2e0byo

Reputation: 5954

What's wrong with plain old

text.split("\x0c")

? That gives me a list of two elements, which looks like what you want here.

You can further split by line if you need to:

sections = [x.split("\n") for x in text.split("\x0c")]

Upvotes: 1

Related Questions