Reputation: 128
I have two Pandas' data frames
df1
items view
0 A|B|C 02-10-2015
1 D|E 02-15-2015
df2
item num val
0 A 1 10
1 B 3 2
2 C 8 9
3 D 9 13
4 E 2 22
I want to mere these frames to get
df
view num1 val1 num2 val2 num3 val3
0 02-10-2015 1 10 3 2 8 9
1 02-15-2015 9 13 2 22 na na
My current approach is to split df1.items
using
df3 = pd.DataFrame(df1['items'].str.split('|').tolist())
which results in
0 1 2
0 A B C
1 D E None
and finally merges each individual column and concatenates them with the original df1
x = pd.merge(df3[[0]], df2, how='left', on='item')
y = pd.merge(df3[[1]], df2, how='left', on='item')
z = pd.merge(df3[[2]], df2, how='left', on='item')
pd.concat([df1, x.ix[:,1:],y.ix[:,1:],z.ix[:,1:]], axis=1)
The code works but it doesn't seem right to me, I would be happy if anyone can point out a proper way to achieve the same results.
Thank you in advance!
Upvotes: 2
Views: 1836
Reputation: 375715
Note: str.split
has a return_type
argument:
In [11]: res = df1['items'].str.split("|", return_type='frame')
In [12]: res
Out[12]:
0 1 2
0 A B C
1 D E NaN
In [13]: res.index = df1['view']
In [14]: res
Out[14]:
0 1 2
view
02-10-2015 A B C
02-15-2015 D E NaN
I think a nicer, more general way, to do this is using stack/unstack:
In [15]: res = res.stack()
In [16]: res
Out[16]:
view
02-10-2015 0 A
1 B
2 C
02-15-2015 0 D
1 E
dtype: object
Now you can either merge, or if you're lucky just switch out the index:
In [17]: df2 = df2.set_index('item') # could just drop this column
In [18]: df2.loc[res] # reorder, may not be required
Out[18]:
num val
item
A 1 10
B 3 2
C 8 9
D 9 13
E 2 22
Now the magic:
In [21]: df2.index = r.index
In [22]: df2
Out[22]:
num val
view
02-10-2015 0 1 10
1 3 2
2 8 9
02-15-2015 0 9 13
1 2 22
In [23]: df2.unstack()
Out[23]:
num val
0 1 2 0 1 2
view
02-10-2015 1 3 8 10 2 9
02-15-2015 9 2 NaN 13 22 NaN
As desired (with a MultiIndex columns, which is what you want).
Note: If you have duplicates (A, B, Cs) you're going to need to merge (which is a little fiddlier, but could be cleaned up). Before [21]:
In [31]: df2.merge(res.to_frame(), left_index=True, right_on=0).unstack()
Out[31]:
num val 0
0 1 2 0 1 2 0 1 2
view
02-10-2015 1 3 8 10 2 9 A B C
02-15-2015 9 2 NaN 13 22 NaN D E NaN
Upvotes: 4