abautista
abautista

Reputation: 2780

How to combine a data frame with another that contains comma separated values?

I am working with 2 data frames that I created based from an Excel file. One data frame contains values that are separated with commas, that is,

     df1            df2
-----------      ------------
0   LFTEG42         X,Y,Z
1   JOCOROW         1,2
2   TLR_U01         I
3   PR_UDG5         O,M

df1 and df2 are my column names. My intention is to merge the two data frames and generate the following output:

 desired result
 ----------
0 LFTEG42X
1 LFTEG42Y
2 LFTEG42Z
3 JOCOROW1
4 JOCOROW2
5 TLR_U01I
6 .....
n PR_UDG5M

This is the code that I used but I ended up with the following result:

input_file = pd.ExcelFile \
    ('C:\\Users\\devel\\Desktop_12\\Testing\\latest_Calculation' + str(datetime.now()).split(' ')[0] + '.xlsx')

# convert the worksheets to dataframes
df1   = pd.read_excel(input_file, index_col=None, na_values=['NA'], parse_cols="H",
                                          sheetname="Analysis")
df2 = pd.read_excel(input_file, index_col=None, na_values=['NA'], parse_cols="I",
                                        sheetname="Analysis")

data_frames_merged = df1.append(df2, ignore_index=True)


current result
--------------
NaN      XYZ
NaN      1,2
NaN      I
...      ...
PR_UDG5  NaN

Questions

  1. Why did I end up receiving a NaN (not a number) value?
  2. How can I achieve my desired result of merging these two data frames with the comma values?

Upvotes: 1

Views: 1071

Answers (4)

piRSquared
piRSquared

Reputation: 294488

Setup

df1 = pd.DataFrame(dict(A='LFTEG42 JOCOROW TLR_U01 PR_UDG5'.split()))
df2 = pd.DataFrame(dict(A='X,Y,Z 1,2 I O,M'.split()))

Getting creative

df1.A.repeat(df2.A.str.count(',') + 1) + ','.join(df2.A).split(',')

0    LFTEG42X
0    LFTEG42Y
0    LFTEG42Z
1    JOCOROW1
1    JOCOROW2
2    TLR_U01I
3    PR_UDG5O
3    PR_UDG5M
dtype: object

Upvotes: 2

Vaishali
Vaishali

Reputation: 38415

Dirty one-liner

new_df = pd.concat([df1['df1'], df2['df2'].str.split(',', expand = True).stack()\
.reset_index(1,drop = True)], axis = 1).sum(1)


0    LFTEG42X
0    LFTEG42Y
0    LFTEG42Z
1    JOCOROW1
1    JOCOROW2
2    TLR_U01I
3    PR_UDG5O
3    PR_UDG5M

Upvotes: 3

DJK
DJK

Reputation: 9274

Also, similar to @vaishali except using melt

df = pd.concat([df1,df2['df2'].str.split(',',expand=True)],axis=1).melt(id_vars='df1').dropna().drop('variable',axis=1).sum(axis=1)

0    LFTEG42X
1    JOCOROW1
2    TLR_U01I
3    PR_UDG5O
4    LFTEG42Y
5    JOCOROW2
7    PR_UDG5M
8    LFTEG42Z

Upvotes: 2

BENY
BENY

Reputation: 323316

I break down the steps

df=pd.concat([df1,df2],axis=1)
df.df2=df.df2.str.split(',')

df=df.set_index('df1').df2.apply(pd.Series).stack().reset_index().drop('level_1',1).rename(columns={0:'df2'})
df['New']=df.df1+df.df2
df
Out[34]: 
       df1 df2       New
0  LFTEG42   X  LFTEG42X
1  LFTEG42   Y  LFTEG42Y
2  LFTEG42   Z  LFTEG42Z
3  JOCOROW   1  JOCOROW1
4  JOCOROW   2  JOCOROW2
5  TLR_U01   I  TLR_U01I
6  PR_UDG5   O  PR_UDG5O
7  PR_UDG5   M  PR_UDG5M

Data Input :

df1
Out[36]: 
       df1
0  LFTEG42
1  JOCOROW
2  TLR_U01
3  PR_UDG5


df2
Out[37]: 
     df2
0  X,Y,Z
1    1,2
2      I
3    O,M

Upvotes: 4

Related Questions