Issue while splitting pandas dataframe column into n columns

Question

I have a dataframe with below column Title. Each sentence is repeating three times. I want to split into three columns evenly.

    Title
0  [1.3] Avg ticket size - merchant vs industry and benchamark (processed data).[1.3] Avg ticket size - merchant vs industry and benchamark (processed data).[1.3] Avg ticket size - merchant vs industry and benchamark (processed data)
1  [10.1] Overall portfolio and benchmarks for the bank over the last 5 quarters.[10.1] Overall portfolio and benchmarks for the bank over the last 5 quarters.[10.1] Overall portfolio and benchmarks for the bank over the last 5 quarters
2  [10.10] Decline Reasons (Quarter-wise)- E-Com vs. POS and comparison with benchmark.[10.10] Decline Reasons (Quarter-wise)- E-Com vs. POS and comparison with benchmark.[10.10] Decline Reasons (Quarter-wise)- E-Com vs. POS and comparison with benchmark
3  [10.2] QoQ Pct of Transaction Method Spend, Transaction Method Active Cards.[10.2] QoQ Pct of Transaction Method Spend, Transaction Method Active Cards.[10.2] QoQ Pct of Transaction Method Spend, Transaction Method Active Cards
4  [10.3] Pct of Transaction Method Spend, Transaction Method Active Cards by Product Type.[10.3] Pct of Transaction Method Spend, Transaction Method Active Cards by Product Type.[10.3] Pct of Transaction Method Spend, Transaction Method Active Cards by Product Type
5  [10.4] QoQ of average ticket size, transactions per card, average spend per card.[10.4] QoQ of average ticket size, transactions per card, average spend per card.[10.4] QoQ of average ticket size, transactions per card, average spend per card

I tried below code but did not work as expected.

import textwrap
pd.DataFrame([textwrap.wrap(el, len(el)//3) for el in df['Title']]).add_prefix('Title')

I want to split by finding the length of string and then split based on len(string)/3. Because sometimes there will be . in middle of sentence

Please help

yatu · Accepted Answer

Looks like you can use str.split here:

df.Title.str.split(r'$$\d+\.\d+$$\s', expand=True)

0                                                  1  \
0    Avg ticket size - merchant vs industry and ben...   
1    Overall portfolio and benchmarks for the bank ...   
2    Decline Reasons (Quarter-wise)- E-Com vs. POS ...   
3    QoQ Pct of Transaction Method Spend, Transacti...   
4    Pct of Transaction Method Spend, Transaction M...   
5    QoQ of average ticket size, transactions per c...   

                                                   2  \
0  Avg ticket size - merchant vs industry and ben...   
1  Overall portfolio and benchmarks for the bank ...   
2  Decline Reasons (Quarter-wise)- E-Com vs. POS ...   
3  QoQ Pct of Transaction Method Spend, Transacti...   
4  Pct of Transaction Method Spend, Transaction M...   
5  QoQ of average ticket size, transactions per c...   

                                                   3  
0  Avg ticket size - merchant vs industry and ben...  
1  Overall portfolio and benchmarks for the bank ...  
2  Decline Reasons (Quarter-wise)- E-Com vs. POS ...  
3  QoQ Pct of Transaction Method Spend, Transacti...  
4  Pct of Transaction Method Spend, Transaction M...  
5  QoQ of average ticket size, transactions per c...

Update

If you want to find the length of the strings in each row and split into 3, one approach could be:

n = 3
lens = df.Title.str.len()//n
l = [[i[(c-1)*sl:c*sl] for i, sl in zip(df.Title, lens)] for c in range(1, n+1)]
pd.DataFrame.from_records(l).T

                                0  \
0  [1.3] Avg ticket size - merchant vs industry a...   
1  [10.1] Overall portfolio and benchmarks for th...   
2  [10.10] Decline Reasons (Quarter-wise)- E-Com ...   
3  [10.2] QoQ Pct of Transaction Method Spend, Tr...   
4  [10.3] Pct of Transaction Method Spend, Transa...   
5  [10.4] QoQ of average ticket size, transaction...   

                                                   1  \
0  .[1.3] Avg ticket size - merchant vs industry ...   
1  .[10.1] Overall portfolio and benchmarks for t...   
2  .[10.10] Decline Reasons (Quarter-wise)- E-Com...   
3  .[10.2] QoQ Pct of Transaction Method Spend, T...   
4  .[10.3] Pct of Transaction Method Spend, Trans...   
5  .[10.4] QoQ of average ticket size, transactio...   

                                                   2  
0  ).[1.3] Avg ticket size - merchant vs industry...  
1  s.[10.1] Overall portfolio and benchmarks for ...  
2  k.[10.10] Decline Reasons (Quarter-wise)- E-Co...  
3  s.[10.2] QoQ Pct of Transaction Method Spend, ...  
4  e.[10.3] Pct of Transaction Method Spend, Tran...  
5  d.[10.4] QoQ of average ticket size, transacti...

Issue while splitting pandas dataframe column into n columns

Answers (2)

Related Questions