kooo1000
kooo1000

Reputation: 39

How can I extract different diagonals of a DataFrame?

I happen to have a dataset that looks like this:

A-B     A-B     A-B     A-B     A-B     B-A     B-A     B-A     B-A     B-A  
 2       3       2       4       5      3.1      3       2      2.5     2.6
NaN     3.2     3.3     3.5     5.2     NaN      4      2.7     3.2      5
NaN     NaN     4.1      4       6      NaN     NaN      4      4.1      6
NaN     NaN     NaN     4.2     5.1     NaN     NaN     NaN     3.5     5.2
NaN     NaN     NaN     NaN      6      NaN     NaN     NaN     NaN     5.7

It's very bad, I know. But what I would like to obtain is:

A-B     B-A
 2      3.1
3.2      4
4.1      4
4.2     3.5
 6      5.7

Which are the values on the "diagonals"

Is there a way I can get something like this?

Upvotes: 2

Views: 277

Answers (2)

sammywemmy
sammywemmy

Reputation: 28644

Another option is to convert to long form, and then drop duplicates: this can be achieved with pivot_longer from pyjanitor:

# pip install pyjanitor
import pandas as pd
import janitor
(
df
.pivot_longer(names_to=".value", 
             names_pattern=r"(.+)", 
             ignore_index=False)
.dropna()
.loc[lambda df: ~df.index.duplicated()]
)

   A-B  B-A
0  2.0  3.1
1  3.2  4.0
2  4.1  4.0
3  4.2  3.5
4  6.0  5.7

@mozway's solution should be faster though, as you avoid building large number of rows only to prune them, which is what this option does.

Upvotes: 0

mozway
mozway

Reputation: 260455

You could use groupby and a dictionary comprehension with numpy.diag:

df2 = pd.DataFrame({x: np.diag(g) for x, g in df.groupby(level=0, axis=1)})

output:

   A-B  B-A
0  2.0  3.1
1  3.2  4.0
2  4.1  4.0
3  4.2  3.5
4  6.0  5.7

Upvotes: 4

Related Questions