Reputation: 119
I have a current Pandas DataFrame in the format below (see Current DataFrame) but I want to change the structure of it to look like the Desired DataFrame below. The top row of titles is longitudes and the first column of titles is latitudes.
Current DataFrame:
E0 E1 E2 E3 E4
LAT
89 0.01 0.01 0.02 0.01 0.00
88 0.01 0.00 0.00 0.01 0.00
87 0.00 0.02 0.01 0.02 0.01
86 0.02 0.00 0.03 0.02 0.00
85 0.00 0.00 0.00 0.01 0.03
Code to build it:
df = pd.DataFrame({
'LAT': [89, 88, 87, 86, 85],
'E0': [0.01, 0.01, 0.0, 0.02, 0.0],
'E1': [0.01, 0.0, 0.02, 0.0, 0.0],
'E2': [0.02, 0.0, 0.01, 0.03, 0.0],
'E3': [0.01, 0.01, 0.02, 0.02, 0.01],
'E4': [0.0, 0.0, 0.01, 0.0, 0.03]
}).set_index('LAT')
Desired DataFrame:
LAT LON R
89 0 0.01
89 1 0.01
89 2 0.02
89 3 0.01
89 4 0.00
88 0 0.01
88 1 0.00
88 2 0.00
88 3 0.01
88 4 0.00
87 0 0.00
87 1 0.02
87 2 0.01
87 3 0.02
87 4 0.01
86 0 0.02
86 1 0.00
86 2 0.03
86 3 0.02
86 4 0.00
85 0 0.00
85 1 0.00
85 2 0.00
85 3 0.01
85 4 0.03
Upvotes: 2
Views: 244
Reputation: 11584
Another approach, does this work:
pd.wide_to_long(df.reset_index(), ['E'], i = 'LAT', j = 'LON').reset_index().sort_values(by = ['LAT','LON'])
LAT LON E
4 85 0 0.00
9 85 1 0.00
14 85 2 0.00
19 85 3 0.01
24 85 4 0.03
3 86 0 0.02
8 86 1 0.00
13 86 2 0.03
18 86 3 0.02
23 86 4 0.00
2 87 0 0.00
7 87 1 0.02
12 87 2 0.01
17 87 3 0.02
22 87 4 0.01
1 88 0 0.01
6 88 1 0.00
11 88 2 0.00
16 88 3 0.01
21 88 4 0.00
0 89 0 0.01
5 89 1 0.01
10 89 2 0.02
15 89 3 0.01
20 89 4 0.00
Upvotes: 1
Reputation: 35646
Try with stack
+ str.extract
:
new_df = (
df.stack()
.reset_index(name='R')
.rename(columns={'level_1': 'LON'})
)
new_df['LON'] = new_df['LON'].str.extract(r'(\d+$)').astype(int)
Or with pd.wide_to_long
+ reindex
:
new_df = df.reset_index()
new_df = (
pd.wide_to_long(new_df, stubnames='E', i='LAT', j='LON')
.reindex(new_df['LAT'], level=0)
.rename(columns={'E': 'R'})
.reset_index()
)
new_df
:
LAT LON R
0 89 0 0.01
1 89 1 0.01
2 89 2 0.02
3 89 3 0.01
4 89 4 0.00
5 88 0 0.01
6 88 1 0.00
7 88 2 0.00
8 88 3 0.01
9 88 4 0.00
10 87 0 0.00
11 87 1 0.02
12 87 2 0.01
13 87 3 0.02
14 87 4 0.01
15 86 0 0.02
16 86 1 0.00
17 86 2 0.03
18 86 3 0.02
19 86 4 0.00
20 85 0 0.00
21 85 1 0.00
22 85 2 0.00
23 85 3 0.01
24 85 4 0.03
Upvotes: 2
Reputation: 28689
You could solve it with pivot_longer from pyjanitor:
# pip install pyjanitor
import janitor
import pandas as pd
df.pivot_longer(index = None,
names_to = 'LON',
values_to = "R",
names_pattern = r".(.)",
sort_by_appearance = True,
ignore_index = False).reset_index()
LAT LON R
0 89 0 0.01
1 89 1 0.01
2 89 2 0.02
3 89 3 0.01
4 89 4 0.00
5 88 0 0.01
6 88 1 0.00
7 88 2 0.00
8 88 3 0.01
9 88 4 0.00
10 87 0 0.00
11 87 1 0.02
12 87 2 0.01
13 87 3 0.02
14 87 4 0.01
15 86 0 0.02
16 86 1 0.00
17 86 2 0.03
18 86 3 0.02
19 86 4 0.00
20 85 0 0.00
21 85 1 0.00
22 85 2 0.00
23 85 3 0.01
24 85 4 0.03
Here we are only interested in the numbers that are at the end of the columns - we get this by passing a regular expression to names_pattern
.
You can avoid pyjanitor
altogether by using melt
and rename
:
(df.rename(columns=lambda col: col[-1])
.melt(var_name='LON', value_name='R', ignore_index=False)
)
LON R
LAT
89 0 0.01
88 0 0.01
87 0 0.00
86 0 0.02
85 0 0.00
89 1 0.01
88 1 0.00
87 1 0.02
86 1 0.00
85 1 0.00
89 2 0.02
88 2 0.00
87 2 0.01
86 2 0.03
85 2 0.00
89 3 0.01
88 3 0.01
87 3 0.02
86 3 0.02
85 3 0.01
89 4 0.00
88 4 0.00
87 4 0.01
86 4 0.00
85 4 0.03
Upvotes: 1
Reputation: 76
Quick and dirty.
Pad your LAT with you LON in a list of tuple pairs.
[
(89.0, 0.01),
(89.1, 0.01),
(89.2, 0.02)
]
Im sure someone can break down a way to organize it like you want... but from what I know you need a unique ID data point for most data in a query structure.
OR:
If you aren't putting this back into a db, then maybe you can use a dict something like this:
{ '89' : { '0' : 0.01,
'1' : 0.01,
'2' : 0.02 .....
}
}
You can then get the data with
dpoint = data['89']['0']
assert dpoint == 0.01
\\ True
dpoint = data['89']['2']
assert dpoint == 0.02
\\ True
Upvotes: 0