Reputation: 4638
My dataframe is a single row with N columns:
col1 col2 col3 col4 col5 col6 col7 col8 col9
0 NBA Mens Sports LAL Lakers BOS Celtics SAS Spurs
Output that I'm looking for is a new dataframe as shown below, Here the first three column value is same for all the rows. However the col4 and col5 values in this new dataframe is replaced by the above DF column values col4,col5 and col6,col 7 and col8,col9 and so on for each row in new dataframe
col1 col2 col3 col4 col5
0 NBA Mens Sports LAL Lakers
1 NBA Mens Sports BOS Celtics
2 NBA Mens Sports SAS Spurs
My Code:
Convert the single row to ndarray
import pandas as pd
df = pd.read_csv('df_info.txt', sep=",", header=0)
vallist=df.as_matrix()[0]
Create a Dict to store the values
dict={}
n=4
varlist1=[]
for i in range(len(vallist)):
if(n<=9):
dict[i]={}
print(vallist[n],vallist[n+1])
dict[i]['col1']=vallist[0]
dict[i]['col2']=vallist[1]
dict[i]['col3']=vallist[2]
dict[i]['col4']=vallist[n]
dict[i]['col5']=vallist[n+1]
n+=2
Import the dict to Dataframe
df2=pd.DataFrame.from_dict(dict)
df2.transpose()
I'm getting the required result but I'm not convinced, looking for more pythonic and panda ways to achieve this.
Upvotes: 2
Views: 357
Reputation: 294318
We can use a comprehension and clever unpacking.
for each row I grab the first three values and the rest
a, b, c, *x in df.values
I then cycle through each pair in the rest by zipping together x[::2]
and x[1::2]
Use rename
and add_prefix
to nail down the column names.
pd.DataFrame([
[a, b, c, d, e]
for a, b, c, *x in df.values
for d, e in zip(x[::2], x[1::2])
]).rename(columns=lambda x: x + 1).add_prefix('col')
col1 col2 col3 col4 col5
0 NBA Mens Sports LAL Lakers
1 NBA Mens Sports BOS Celtics
2 NBA Mens Sports SAS Spurs
Upvotes: 3
Reputation: 164693
Using numpy.repeat
and itertools.chain
with some dictionary comprehensions:
import numpy as np
from itertools import chain
df['abbr_combined'] = list(zip(df.col4, df.col6, df.col8))
df['team_combined'] = list(zip(df.col5, df.col7, df.col9))
lens = df['team_combined'].map(len)
res = pd.DataFrame({**{col: np.repeat(df[col], lens) for col in ('col1', 'col2', 'col3')},
**{col: list(chain.from_iterable(df[name])) for col, name in
zip(('col4', 'col5'), ('abbr_combined', 'team_combined'))}})
print(res)
col1 col2 col3 col4 col5
0 NBA Mens Sports LAL Lakers
0 NBA Mens Sports BOS Celtics
0 NBA Mens Sports SAS Spurs
Upvotes: 1