Bilal
Bilal

Reputation: 65

splitting column into multiple columns

I have a dataframe containing one column. I want to split it into multiple columns

106
B-PER
I-PER
I-PER
B-PER
I-PER
I-PER
I-PER
B-PER
B-PROPH
109
B-PER
B-PER
I-PER
B-PER
I-PER
B-PER
I-PER
B-PER
I-PER
B-PROPH
116
B-PER
I-PER
I-PER
B-PER
B-PER
B-PER
B-PER

I want to split this column into multiple columns when integer value come. I know i have to iterate over rows but I don't know how to split it. required output is:

106          109           116                          
B-PER        B-PER         B-PER
I-PER        B-PER         I-PER
I-PER        I-PER         I-PER
B-PER        B-PER         B-PER
I-PER        I-PER         B-PER
I-PER        B-PER         B-PER
I-PER        I-PER         B-PER
B-PER        B-PER
B-PROPH      I=PER
             PROPH

Upvotes: 3

Views: 106

Answers (3)

yatu
yatu

Reputation: 88226

Here's one approach using pivot_table, assuming your column is called 'col':

g = df.col.str.isnumeric().cumsum()
out = df.pivot_table(df, 
                   columns=g, 
                   index=g.reset_index().groupby('col').cumcount(), 
                   aggfunc='first', 
                   fill_value='')
out.columns = out.loc[0]
out.drop(0)

0       106      109    116
1     B-PER    B-PER  B-PER
2     I-PER    B-PER  I-PER
3     I-PER    I-PER  I-PER
4     B-PER    B-PER  B-PER
5     I-PER    I-PER  B-PER
6     I-PER    B-PER  B-PER
7     I-PER    I-PER  B-PER
8     B-PER    B-PER       
9   B-PROPH    I-PER       
10           B-PROPH       

Upvotes: 1

Umar.H
Umar.H

Reputation: 23099

First create a key column and a new index using cumcount()

finally, we can use unstack

we use iloc[1:] to remove the column name from the first row.

df['key'] = pd.to_numeric(df[0],errors='coerce').ffill()
df1 = df.set_index([df.groupby('key').cumcount(),'key']).unstack(1).iloc[1:].droplevel(0,1)

key    106.0    109.0  116.0
1      B-PER    B-PER  B-PER
2      I-PER    B-PER  I-PER
3      I-PER    I-PER  I-PER
4      B-PER    B-PER  B-PER
5      I-PER    I-PER  B-PER
6      I-PER    B-PER   B-PE
7      I-PER    I-PER    NaN
8      B-PER    B-PER    NaN
9    B-PROPH    I-PER    NaN
10       NaN  B-PROPH    NaN

Upvotes: 1

jezrael
jezrael

Reputation: 862491

Use:

#test numeric values
m = df.A.astype(str).str.isnumeric()
#repeat only numeric values to groups
df['g'] = df.A.where(m).ffill()
#filter out rows without numeric (because repeated)
df = df[~m]
#reshape
df1 = df.set_index([df.groupby('g').cumcount(), 'g'])['A'].unstack(fill_value='')

print (df1)

g      106      109    116
0    B-PER    B-PER  B-PER
1    I-PER    B-PER  I-PER
2    I-PER    I-PER  I-PER
3    B-PER    B-PER  B-PER
4    I-PER    I-PER  B-PER
5    I-PER    B-PER  B-PER
6    I-PER    I-PER  B-PER
7    B-PER    B-PER       
8  B-PROPH    I-PER       
9           B-PROPH   

Upvotes: 2

Related Questions