Reputation: 11
I have a dataframe with multiple columns, the most important of which is a Headline. I need to split this headline into multiple columns (let's say 4 columns), and each of these resulting 4 columns has an individual length constraint (column1 = 10 chars, column2 = 15 chars, column3 = 15, column4 = 25). I have researched ways of using textwrap to do this, but can't determine how to apply textwrap to a dataframe. An iterative process to split the full string into its words and recompile while checking the recompiled length against the constraint could be an option as well.
Example Headline: Act fast. Limited space available.
Result
Column1: Act fast.
Column2: Limited space
Column3: available.
Column4: (blank)
To make this really fun, I'm a neophyte at Python - so please be gentle.
Upvotes: 1
Views: 444
Reputation: 43169
See the whole solution here:
import pandas as pd
d = {'junk': 'Act fast. Limited space available.'}
df = pd.DataFrame(d.values(), columns=['raw_text'])
df = df['raw_text'].str.extract(r'^(?P<column1>.{1,10}\b)(?P<column2>.{1,15}\b)(?P<column3>.{1,15}\b)(?P<column4>.{0,25}\b)', expand=True)
print(df)
This yields
column1 column2 column3 column4
0 Act fast. Limited space available
Upvotes: 1