Reputation: 119

How to Change the Structure of Pandas Dataframe in Python?

I have a current Pandas DataFrame in the format below (see Current DataFrame) but I want to change the structure of it to look like the Desired DataFrame below. The top row of titles is longitudes and the first column of titles is latitudes.

Current DataFrame:

       E0    E1    E2    E3    E4
LAT                              
89   0.01  0.01  0.02  0.01  0.00
88   0.01  0.00  0.00  0.01  0.00
87   0.00  0.02  0.01  0.02  0.01
86   0.02  0.00  0.03  0.02  0.00
85   0.00  0.00  0.00  0.01  0.03

Code to build it:

df = pd.DataFrame({
    'LAT': [89, 88, 87, 86, 85],
    'E0': [0.01, 0.01, 0.0, 0.02, 0.0],
    'E1': [0.01, 0.0, 0.02, 0.0, 0.0],
    'E2': [0.02, 0.0, 0.01, 0.03, 0.0],
    'E3': [0.01, 0.01, 0.02, 0.02, 0.01],
    'E4': [0.0, 0.0, 0.01, 0.0, 0.03]
}).set_index('LAT')

Desired DataFrame:

LAT  LON     R
 89    0  0.01 
 89    1  0.01
 89    2  0.02
 89    3  0.01
 89    4  0.00
 88    0  0.01
 88    1  0.00
 88    2  0.00
 88    3  0.01
 88    4  0.00
 87    0  0.00
 87    1  0.02
 87    2  0.01
 87    3  0.02
 87    4  0.01
 86    0  0.02
 86    1  0.00
 86    2  0.03
 86    3  0.02
 86    4  0.00
 85    0  0.00
 85    1  0.00
 85    2  0.00
 85    3  0.01
 85    4  0.03

Upvotes: 2

Answers (4)

Karthik S

Reputation: 11584

Another approach, does this work:

pd.wide_to_long(df.reset_index(), ['E'], i = 'LAT', j = 'LON').reset_index().sort_values(by = ['LAT','LON'])
LAT LON E
4   85  0   0.00
9   85  1   0.00
14  85  2   0.00
19  85  3   0.01
24  85  4   0.03
3   86  0   0.02
8   86  1   0.00
13  86  2   0.03
18  86  3   0.02
23  86  4   0.00
2   87  0   0.00
7   87  1   0.02
12  87  2   0.01
17  87  3   0.02
22  87  4   0.01
1   88  0   0.01
6   88  1   0.00
11  88  2   0.00
16  88  3   0.01
21  88  4   0.00
0   89  0   0.01
5   89  1   0.01
10  89  2   0.02
15  89  3   0.01
20  89  4   0.00

Upvotes: 1

Henry Ecker

Reputation: 35646

Try with stack + str.extract:

new_df = (
    df.stack()
        .reset_index(name='R')
        .rename(columns={'level_1': 'LON'})
)
new_df['LON'] = new_df['LON'].str.extract(r'(\d+$)').astype(int)

Or with pd.wide_to_long + reindex:

new_df = df.reset_index()
new_df = (
    pd.wide_to_long(new_df, stubnames='E', i='LAT', j='LON')
        .reindex(new_df['LAT'], level=0)
        .rename(columns={'E': 'R'})
        .reset_index()
)

new_df:

    LAT  LON     R
0    89    0  0.01
1    89    1  0.01
2    89    2  0.02
3    89    3  0.01
4    89    4  0.00
5    88    0  0.01
6    88    1  0.00
7    88    2  0.00
8    88    3  0.01
9    88    4  0.00
10   87    0  0.00
11   87    1  0.02
12   87    2  0.01
13   87    3  0.02
14   87    4  0.01
15   86    0  0.02
16   86    1  0.00
17   86    2  0.03
18   86    3  0.02
19   86    4  0.00
20   85    0  0.00
21   85    1  0.00
22   85    2  0.00
23   85    3  0.01
24   85    4  0.03

Upvotes: 2

sammywemmy

Reputation: 28689

You could solve it with pivot_longer from pyjanitor:

# pip install pyjanitor
import janitor
import pandas as pd
df.pivot_longer(index = None,
                names_to = 'LON', 
                values_to = "R", 
                names_pattern = r".(.)", 
                sort_by_appearance = True,
                ignore_index = False).reset_index()

    LAT LON     R
0    89   0  0.01
1    89   1  0.01
2    89   2  0.02
3    89   3  0.01
4    89   4  0.00
5    88   0  0.01
6    88   1  0.00
7    88   2  0.00
8    88   3  0.01
9    88   4  0.00
10   87   0  0.00
11   87   1  0.02
12   87   2  0.01
13   87   3  0.02
14   87   4  0.01
15   86   0  0.02
16   86   1  0.00
17   86   2  0.03
18   86   3  0.02
19   86   4  0.00
20   85   0  0.00
21   85   1  0.00
22   85   2  0.00
23   85   3  0.01
24   85   4  0.03

Here we are only interested in the numbers that are at the end of the columns - we get this by passing a regular expression to names_pattern.

You can avoid pyjanitor altogether by using melt and rename:

(df.rename(columns=lambda col: col[-1])
   .melt(var_name='LON', value_name='R', ignore_index=False)
 )

    LON     R
LAT          
89    0  0.01
88    0  0.01
87    0  0.00
86    0  0.02
85    0  0.00
89    1  0.01
88    1  0.00
87    1  0.02
86    1  0.00
85    1  0.00
89    2  0.02
88    2  0.00
87    2  0.01
86    2  0.03
85    2  0.00
89    3  0.01
88    3  0.01
87    3  0.02
86    3  0.02
85    3  0.01
89    4  0.00
88    4  0.00
87    4  0.01
86    4  0.00
85    4  0.03

Upvotes: 1

Aaron Wright

Reputation: 76

Quick and dirty.

Pad your LAT with you LON in a list of tuple pairs.

[  
   (89.0, 0.01),
   (89.1, 0.01),
   (89.2, 0.02)
]

Im sure someone can break down a way to organize it like you want... but from what I know you need a unique ID data point for most data in a query structure.

OR:

If you aren't putting this back into a db, then maybe you can use a dict something like this:

{ '89' : { '0' : 0.01,
           '1' : 0.01,
           '2' : 0.02 .....
         }
}

You can then get the data with

dpoint = data['89']['0']

assert dpoint == 0.01
\\ True

dpoint = data['89']['2']

assert dpoint == 0.02
\\ True

Upvotes: 0

How to Change the Structure of Pandas Dataframe in Python?

Answers (4)

Related Questions