min2bro
min2bro

Reputation: 4638

Create multirow dataframe from Single row dataframe

My dataframe is a single row with N columns:

   col1    col2    col3    col4   col5    col6   col7     col8    col9      
0  NBA      Mens    Sports  LAL    Lakers   BOS   Celtics    SAS    Spurs    

Output that I'm looking for is a new dataframe as shown below, Here the first three column value is same for all the rows. However the col4 and col5 values in this new dataframe is replaced by the above DF column values col4,col5 and col6,col 7 and col8,col9 and so on for each row in new dataframe

    col1    col2    col3    col4    col5
0   NBA    Mens   Sports    LAL     Lakers
1   NBA    Mens   Sports    BOS     Celtics
2   NBA    Mens   Sports    SAS     Spurs

My Code:

Convert the single row to ndarray

import pandas as pd
df = pd.read_csv('df_info.txt', sep=",", header=0)
vallist=df.as_matrix()[0]

Create a Dict to store the values

dict={}
n=4
varlist1=[]
for i in range(len(vallist)):
    if(n<=9):
        dict[i]={}
        print(vallist[n],vallist[n+1])
        dict[i]['col1']=vallist[0]
        dict[i]['col2']=vallist[1]
        dict[i]['col3']=vallist[2]
        dict[i]['col4']=vallist[n]
        dict[i]['col5']=vallist[n+1]
        n+=2

Import the dict to Dataframe

df2=pd.DataFrame.from_dict(dict)
df2.transpose()

I'm getting the required result but I'm not convinced, looking for more pythonic and panda ways to achieve this.

Upvotes: 2

Views: 357

Answers (2)

piRSquared
piRSquared

Reputation: 294318

We can use a comprehension and clever unpacking.

  • for each row I grab the first three values and the rest

    a, b, c, *x in df.values
    
  • I then cycle through each pair in the rest by zipping together x[::2] and x[1::2]

  • Use rename and add_prefix to nail down the column names.

  • This generalizes to any number of pairs subsequent to the first three columns.

pd.DataFrame([
    [a, b, c, d, e]
    for a, b, c, *x in df.values
    for d, e in zip(x[::2], x[1::2])
]).rename(columns=lambda x: x + 1).add_prefix('col')

  col1  col2    col3 col4     col5
0  NBA  Mens  Sports  LAL   Lakers
1  NBA  Mens  Sports  BOS  Celtics
2  NBA  Mens  Sports  SAS    Spurs

Upvotes: 3

jpp
jpp

Reputation: 164693

Using numpy.repeat and itertools.chain with some dictionary comprehensions:

import numpy as np
from itertools import chain

df['abbr_combined'] = list(zip(df.col4, df.col6, df.col8))
df['team_combined'] = list(zip(df.col5, df.col7, df.col9))

lens = df['team_combined'].map(len)

res = pd.DataFrame({**{col: np.repeat(df[col], lens) for col in ('col1', 'col2', 'col3')},
                    **{col: list(chain.from_iterable(df[name])) for col, name in
                            zip(('col4', 'col5'), ('abbr_combined', 'team_combined'))}})

print(res)

  col1  col2    col3 col4     col5
0  NBA  Mens  Sports  LAL   Lakers
0  NBA  Mens  Sports  BOS  Celtics
0  NBA  Mens  Sports  SAS    Spurs

Upvotes: 1

Related Questions