Reputation: 49
I currently have a wide dataframe that looks like this:
Index ID1 ID2 Foc_A Foc_B Foc_C Sat_A Sat_B Sat_C
0 r 1 10 15 17 100 105 107
1 r 2 20 25 27 110 115 117
2 s 1 30 35 37 120 125 127
3 s 2 40 45 47 130 135 137
Each entry has multiple identifier columns (ID1 and ID2). I then have two separate categories of measurements (Foc and Sat) each of which contains multiple identifiers (A, B, C)(the category identifiers are shared between categories). I'm eventually needing to plot it in a facet_grid with x and y as each category of measurement and separated by category identifier, so I'm trying to reshape it so that it looks like this:
Index ID1 ID2 Ch Foc Sat
0 r 1 A 10 100
1 r 1 B 15 105
2 r 1 C 17 107
3 r 2 A 20 110
4 r 2 B 25 115
5 r 2 C 27 117
6 s 1 A 30 120
7 s 1 B 35 125
8 s 1 C 37 127
I've been trying .melt, .pivot, and .stack but not understanding what I'm doing well enough to make headway.
Upvotes: 4
Views: 508
Reputation: 23099
Creating a multi_index
then using stack
df = df.set_index(['ID1','ID2'])
df.columns = df.columns.str.split('_',expand=True)
df1 = df.stack(1).reset_index().rename(columns={'level_2' : 'Ch'})
ID1 ID2 Ch Foc Sat
0 r 1 A 10 100
1 r 1 B 15 105
2 r 1 C 17 107
3 r 2 A 20 110
4 r 2 B 25 115
5 r 2 C 27 117
6 s 1 A 30 120
7 s 1 B 35 125
8 s 1 C 37 127
9 s 2 A 40 130
10 s 2 B 45 135
11 s 2 C 47 137
Upvotes: 0
Reputation: 323266
Let us do wide_to_long
df = pd.wide_to_long(df,['Foc','Sat'],i=['ID1','ID2'],j='Ch',sep='_',suffix='\w+').reset_index()
Out[168]:
ID1 ID2 Ch Foc Sat
0 r 1 A 10 100
1 r 1 B 15 105
2 r 1 C 17 107
3 r 2 A 20 110
4 r 2 B 25 115
5 r 2 C 27 117
6 s 1 A 30 120
7 s 1 B 35 125
8 s 1 C 37 127
9 s 2 A 40 130
10 s 2 B 45 135
11 s 2 C 47 137
Upvotes: 1
Reputation: 16683
you could also achieve this with .melt
, .groupby
and np.where
:
df = pd.melt(df, id_vars=['ID1','ID2','Foc_A', 'Foc_B', 'Foc_C'], var_name='Ch', value_name='Sat') \
.groupby(['ID1','ID2','Ch']).agg({'Foc_A':'max','Foc_B':'max', 'Foc_C':'max','Sat':'max'}).reset_index()
df['Foc'] = np.where((df['Ch'] == 'Sat_A'), df['Foc_A'], '')
df['Foc'] = np.where((df['Ch'] == 'Sat_B'), df['Foc_B'], df['Foc'])
df['Foc'] = np.where((df['Ch'] == 'Sat_C'), df['Foc_C'], df['Foc'])
df['Ch'] = df['Ch'].str.replace('Sat_', '')
df = df.drop(['Foc_A', 'Foc_B', 'Foc_C'], axis=1)
df
output:
ID1 ID2 Ch Sat Foc
0 r 1 A 100 10
1 r 1 B 105 15
2 r 1 C 107 17
3 r 2 A 110 20
4 r 2 B 115 25
5 r 2 C 117 27
6 s 1 A 120 30
7 s 1 B 125 35
8 s 1 C 127 37
9 s 2 A 130 40
10 s 2 B 135 45
11 s 2 C 137 47
Upvotes: 0
Reputation: 21719
You are thinking in the right way. You can do:
# melt the dataframe
d1 = df.set_index(['Index', 'ID1', 'ID2']).stack().reset_index()
# create separate column
d1[['flag', 'Ch']] = d1['level_3'].str.split('_', expand=True)
d1 = d1.drop('level_3', 1)
d1 = d1.rename(columns = {0: 'Foc'})
# expand the dataframe to wide
d2 = pd.pivot_table(d1, index=['Index', 'ID1', 'ID2', 'Ch'], columns=['flag']).reset_index()
# fix column names
d2.columns = ['Index', 'ID1', 'ID2', 'Ch', 'Foc', 'Sat']
print(d2.head())
Index ID1 ID2 Ch Foc Sat
0 0 r 1 A 10 100
1 0 r 1 B 15 105
2 0 r 1 C 17 107
3 1 r 2 A 20 110
4 1 r 2 B 25 115
Upvotes: 2
Reputation: 68146
I'd set ID columns to the index, split and expand the columns on the '_' character, then stack the dataframe:
from io import StringIO
import pandas
datafile = StringIO("""\
Index ID1 ID2 Foc_A Foc_B Foc_C Sat_A Sat_B Sat_C
0 r 1 10 15 17 100 105 107
1 r 2 20 25 27 110 115 117
2 s 1 30 35 37 120 125 127
3 s 2 40 45 47 130 135 137
""")
(
pandas.read_csv(datafile, sep='\s+')
.set_index(['ID1', 'ID2'])
.drop(columns='Index')
.pipe(lambda df:
df.set_axis(
df.columns.str.split('_', expand=True),
axis="columns"
)
)
.rename_axis([None, 'Ch'], axis='columns')
.stack(level='Ch')
.reset_index()
)
And that give me:
ID1 ID2 Ch Foc Sat
0 r 1 A 10 100
1 r 1 B 15 105
2 r 1 C 17 107
3 r 2 A 20 110
4 r 2 B 25 115
5 r 2 C 27 117
6 s 1 A 30 120
7 s 1 B 35 125
8 s 1 C 37 127
9 s 2 A 40 130
10 s 2 B 45 135
11 s 2 C 47 137
Upvotes: 1