Reputation: 3208
I have created a matrix:
items = [0, 1, 2, 3]
item_to_item = pd.DataFrame(index=items, columns=items)
I've put values in it so:
for example:
0 1 2 3
0 0 4 5 9
1 4 0 3 7
2 5 3 0 3
3 9 7 3 0
I want to create a data frame of all possible pairs (from [0, 1, 2, 3]) so that there wont be pairs of (x, x)
and if (x, y)
is in, I dont want (y, x)
becuase its symetric and holds the same value.
In the end I will have the following Dataframe (or numpy 2d array)
item, item, value
0 1 4
0 2 5
0 3 9
1 2 3
1 3 7
2 3 3
Upvotes: 1
Views: 449
Reputation: 221514
Here's a NumPy solution with np.triu_indices
-
In [453]: item_to_item
Out[453]:
0 1 2 3
0 0 4 5 9
1 4 0 3 7
2 5 3 0 3
3 9 7 3 0
In [454]: r,c = np.triu_indices(len(items),1)
In [455]: pd.DataFrame(np.column_stack((r,c, item_to_item.values[r,c])))
Out[455]:
0 1 2
0 0 1 4
1 0 2 5
2 0 3 9
3 1 2 3
4 1 3 7
5 2 3 3
Upvotes: 1
Reputation:
numpy's np.triu gives you the upper triangle with all other elements set to zero. You can use that to construct your DataFrame and replace them with NaNs (so that they are dropped when you stack the columns):
pd.DataFrame(np.triu(df), index=df.index, columns=df.columns).replace(0, np.nan).stack()
Out:
0 1 4.0
2 5.0
3 9.0
1 2 3.0
3 7.0
2 3 3.0
dtype: float64
You can use reset_index
at the end to convert indices to columns.
Another alternative would be resetting the index and stacking again but this time use a callable to slice the DataFrame:
df.stack().reset_index()[lambda x: x['level_0'] < x['level_1']]
Out:
level_0 level_1 0
1 0 1 4
2 0 2 5
3 0 3 9
6 1 2 3
7 1 3 7
11 2 3 3
This one requires pandas 0.18.0.
Upvotes: 2