Reputation: 39
I happen to have a dataset that looks like this:
A-B A-B A-B A-B A-B B-A B-A B-A B-A B-A
2 3 2 4 5 3.1 3 2 2.5 2.6
NaN 3.2 3.3 3.5 5.2 NaN 4 2.7 3.2 5
NaN NaN 4.1 4 6 NaN NaN 4 4.1 6
NaN NaN NaN 4.2 5.1 NaN NaN NaN 3.5 5.2
NaN NaN NaN NaN 6 NaN NaN NaN NaN 5.7
It's very bad, I know. But what I would like to obtain is:
A-B B-A
2 3.1
3.2 4
4.1 4
4.2 3.5
6 5.7
Which are the values on the "diagonals"
Is there a way I can get something like this?
Upvotes: 2
Views: 277
Reputation: 28644
Another option is to convert to long form, and then drop duplicates: this can be achieved with pivot_longer
from pyjanitor
:
# pip install pyjanitor
import pandas as pd
import janitor
(
df
.pivot_longer(names_to=".value",
names_pattern=r"(.+)",
ignore_index=False)
.dropna()
.loc[lambda df: ~df.index.duplicated()]
)
A-B B-A
0 2.0 3.1
1 3.2 4.0
2 4.1 4.0
3 4.2 3.5
4 6.0 5.7
@mozway's solution should be faster though, as you avoid building large number of rows only to prune them, which is what this option does.
Upvotes: 0
Reputation: 260455
You could use groupby
and a dictionary comprehension with numpy.diag
:
df2 = pd.DataFrame({x: np.diag(g) for x, g in df.groupby(level=0, axis=1)})
output:
A-B B-A
0 2.0 3.1
1 3.2 4.0
2 4.1 4.0
3 4.2 3.5
4 6.0 5.7
Upvotes: 4