Raghu
Raghu

Reputation: 181

How to convert single column data into multiple columns in python dataframe?

I have a data frame (df) with column col1 which has many rows and there are rows with a common string (Collection of numbers are) and ending with different numbers(001, 002, 005). I want to extract the rows between two string (Collection of numbers are 002 to Collection of numbers are 003) and assign them to new column with same row name (Collection of numbers are 002)

    col1
0   Collection of numbers are 002
1   53
2   20
3   56
4   Collection of numbers are 003
5   236
6   325
7   Collection of numbers are 005
8   96
9   23
10  63

I want to convert the how above data frame to the following format.

0   Collection of numbers are 002   Collection of numbers are 003   Collection of numbers are 005
1   53                              236                              96
2   20                              325                              23
3   56                                                               63

Note: No repeated numbers

Upvotes: 3

Views: 4387

Answers (4)

Ben.T
Ben.T

Reputation: 29635

you can use set_index and unstack. I steal the idea of @Datanovice for extracting the name of the future columns and use groupby.cumcount to get the future index numbers:

arrCollection = df['col1'].str.extract('(Collection.*)').ffill()[0].to_numpy()
df_f = df.set_index([df.groupby(arrCollection)['col1'].cumcount()-1,
                     arrCollection])['col1']\
         .unstack().iloc[1:,:]

print (df_f)
  Collection 002 Collection 003 Collection 005
0             53            236             96
1             20            325             23
2             56            NaN             63

Note: the name of the columns will be like in your example, I did not use the exact same input

Upvotes: 1

sempersmile
sempersmile

Reputation: 481

The answer provided by Datanovic seems good. An alternative solution would be following function:

def extract_columns(df, column, common_string):
    df_list = df[column].tolist()
    df_new = pd.DataFrame()
    row_indices = []
    cols = []
    for ind, elem in enumerate(df_list):
        if common_string in str(elem):
            row_indices.append(ind)
            cols.append(elem)

    row_indices.append(len(df_list))

    for ind, col in enumerate(cols): 
        df_new[col] = pd.Series(df_list[row_indices[ind]+1:row_indices[ind+1]])

    return df_new

So with your example dataframe you will get following result when calling the function extract_columns(df, 'col1', 'Collection of numbers are')

   Collection of numbers are 002  Collection of numbers are 003  Collection of numbers are 005
0                             53                          236.0                             96
1                             20                          325.0                             23
2                             56                            NaN                             63

Upvotes: 0

trigonom
trigonom

Reputation: 528

in

                    col1
0   c of numbers are 002
1                      1
2                      2
3                      3
4   c of numbers are 003
5                     55
6                     66
7   c of numbers are 005
8                     45
9                     23
10                    12
11                   456
12                    56

for_concat = []
col = []
for i,r in df.iterrows():
    if "numbers" in str(r["col1"]):
        if col:
            for_concat.append(pd.DataFrame(col,columns=[col_name]))
            col_name = r["col1"]
            col = []
        else:
            col_name = r["col1"]
    else:
        col.append(r["col1"])
for_concat.append(pd.DataFrame(col,columns=[col_name]))
out = pd.concat(for_concat, axis =1)

out:

   c of numbers are 002  c of numbers are 003  c of numbers are 005
0                   1.0                  55.0                    45
1                   2.0                  66.0                    23
2                   3.0                   NaN                    12
3                   NaN                   NaN                   456
4                   NaN                   NaN                    56

Upvotes: 0

Umar.H
Umar.H

Reputation: 23099

We could try ffill and some basic resphaping using str.split

df['headers'] = df['col1'].str.extract('(Collection.*)').ffill()


df1 = df[~df['col1'].str.contains('Collection')].copy()


df1.groupby('headers').agg(','.join)['col1'].str.split(',',expand=True).T.rename_axis('',axis='columns')

out:

  Collection of numbers are 002 Collection of numbers are 003  \
0                            53                           236   
1                            20                           325   
2                            56                          None   

  Collection of numbers are 005  
0                            96  
1                            23  
2                            63  

Upvotes: 4

Related Questions