Reputation: 2780
I am working with 2 data frames that I created based from an Excel file. One data frame contains values that are separated with commas, that is,
df1 df2
----------- ------------
0 LFTEG42 X,Y,Z
1 JOCOROW 1,2
2 TLR_U01 I
3 PR_UDG5 O,M
df1 and df2 are my column names. My intention is to merge the two data frames and generate the following output:
desired result
----------
0 LFTEG42X
1 LFTEG42Y
2 LFTEG42Z
3 JOCOROW1
4 JOCOROW2
5 TLR_U01I
6 .....
n PR_UDG5M
This is the code that I used but I ended up with the following result:
input_file = pd.ExcelFile \
('C:\\Users\\devel\\Desktop_12\\Testing\\latest_Calculation' + str(datetime.now()).split(' ')[0] + '.xlsx')
# convert the worksheets to dataframes
df1 = pd.read_excel(input_file, index_col=None, na_values=['NA'], parse_cols="H",
sheetname="Analysis")
df2 = pd.read_excel(input_file, index_col=None, na_values=['NA'], parse_cols="I",
sheetname="Analysis")
data_frames_merged = df1.append(df2, ignore_index=True)
current result
--------------
NaN XYZ
NaN 1,2
NaN I
... ...
PR_UDG5 NaN
Questions
Upvotes: 1
Views: 1071
Reputation: 294488
Setup
df1 = pd.DataFrame(dict(A='LFTEG42 JOCOROW TLR_U01 PR_UDG5'.split()))
df2 = pd.DataFrame(dict(A='X,Y,Z 1,2 I O,M'.split()))
Getting creative
df1.A.repeat(df2.A.str.count(',') + 1) + ','.join(df2.A).split(',')
0 LFTEG42X
0 LFTEG42Y
0 LFTEG42Z
1 JOCOROW1
1 JOCOROW2
2 TLR_U01I
3 PR_UDG5O
3 PR_UDG5M
dtype: object
Upvotes: 2
Reputation: 38415
Dirty one-liner
new_df = pd.concat([df1['df1'], df2['df2'].str.split(',', expand = True).stack()\
.reset_index(1,drop = True)], axis = 1).sum(1)
0 LFTEG42X
0 LFTEG42Y
0 LFTEG42Z
1 JOCOROW1
1 JOCOROW2
2 TLR_U01I
3 PR_UDG5O
3 PR_UDG5M
Upvotes: 3
Reputation: 9274
Also, similar to @vaishali except using melt
df = pd.concat([df1,df2['df2'].str.split(',',expand=True)],axis=1).melt(id_vars='df1').dropna().drop('variable',axis=1).sum(axis=1)
0 LFTEG42X
1 JOCOROW1
2 TLR_U01I
3 PR_UDG5O
4 LFTEG42Y
5 JOCOROW2
7 PR_UDG5M
8 LFTEG42Z
Upvotes: 2
Reputation: 323316
I break down the steps
df=pd.concat([df1,df2],axis=1)
df.df2=df.df2.str.split(',')
df=df.set_index('df1').df2.apply(pd.Series).stack().reset_index().drop('level_1',1).rename(columns={0:'df2'})
df['New']=df.df1+df.df2
df
Out[34]:
df1 df2 New
0 LFTEG42 X LFTEG42X
1 LFTEG42 Y LFTEG42Y
2 LFTEG42 Z LFTEG42Z
3 JOCOROW 1 JOCOROW1
4 JOCOROW 2 JOCOROW2
5 TLR_U01 I TLR_U01I
6 PR_UDG5 O PR_UDG5O
7 PR_UDG5 M PR_UDG5M
Data Input :
df1
Out[36]:
df1
0 LFTEG42
1 JOCOROW
2 TLR_U01
3 PR_UDG5
df2
Out[37]:
df2
0 X,Y,Z
1 1,2
2 I
3 O,M
Upvotes: 4