Reputation: 7888
There are a lot of questions out there with similar titles but I'm unable to solve the issues that I'm having with my dataset.
Dataset:
ID Country Type Region Gender IA01_Raw IA01_Class1 IA01_Class2 IA02_Raw IA02_Class1 IA02_Class2 QA_Include QA_Comments
SC1 France A Europe Male 4 8 1 J 4 1 yes N/A
SC2 France A Europe Female 2 7 2 Q 6 4 yes N/A
SC3 France B Europe Male 3 7 2 K 8 2 yes N/A
SC4 France A Europe Male 4 8 2 A 2 1 yes N/A
SC5 France B Europe Male 1 7 1 F 1 3 yes N/A
ID6 France A Europe Male 2 8 1 R 3 7 yes N/A
ID7 France B Europe Male 2 8 1 Q 4 6 yes N/A
UC8 France B Europe Male 4 8 2 P 4 2 yes N/A
Required output:
ID Country Type Region Gender IA Raw Class1 Class2 QA_Include QA_Comments
SC1 France A Europe Male 01 K 8 1 yes N/A
SC1 France A Europe Male 01 L 8 1 yes N/A
SC1 France A Europe Male 01 P 8 1 yes N/A
SC1 France A Europe Male 02 Q 8 1 yes N/A
SC1 France A Europe Male 02 R 8 1 yes N/A
SC1 France A Europe Male 02 T 8 1 yes N/A
SC1 France A Europe Male 03 G 8 1 yes N/A
SC1 France A Europe Male 03 R 8 1 yes N/A
SC1 France A Europe Male 03 G 8 1 yes N/A
SC1 France A Europe Male 04 K 8 1 yes N/A
SC1 France A Europe Male 04 A 8 1 yes N/A
SC1 France A Europe Male 04 P 8 1 yes N/A
SC1 France A Europe Male 05 R 8 1 yes N/A
....
In the Dataset I've columns which are names as IA[X]_NAME where X = 1..9 and NAME = Raw, Class1 and Class2.
What I am trying to do is to just transpose these columns so that it looks like the table shown in Required output i.e. IA will show X value and just like this raw and classes will show their perspective values.
So in order to achieve it I sliced the columns as:
idVars = list(excel_df_final.columns[0:40]) + list(excel_df_final.columns[472:527]) #These contain columns like ID, Country, Type etc
valueVars = excel_df_final.columns[41:472].tolist() #All the IA_ columns
I don't know if this step was necessary but this gave me the perfect slices of columns but when I put it in melt
it is not working properly. I have tried almost every method that is available in other questions.
pd.melt(excel_df_final, id_vars=idVars,value_vars=valueVars)
I've also tried this:
excel_df_final.set_index(idVars)[41:472].unstack()
but didn't work and here is Wide to long implementation which also didn't work:
pd.wide_to_long(excel_df_final, stubnames = ['IA', 'Raw', 'Class1', 'Class2'], i=idVars, j=valueVars)
The error I got for wide to long is:
ValueError: operands could not be broadcast together with shapes (95,) (431,)
As my dataset has 526 columns in real, so that is why I've divided them into two lists one contains 95 column names which will be the i
and the rest 431 are the one that I need to show in the row as shown in the sample data set.
Upvotes: 5
Views: 138
Reputation: 4882
u can use pd.lreshape
pd.lreshape(df.assign(IA01=['01']*len(df), IA02=['02']*len(df),IA09=['09']*len(df)),
{'IA': ['IA01', 'IA02','IA09'],
'Raw': ['IA01_Raw','IA02_Raw','IA09_Raw'],
'Class1': ['IA01_Class1','IA02_Class1','IA09_Class1'],
'Class2': ['IA01_Class2', 'IA02_Class2','IA09_Class2']
})
edit :
pd.lreshape(df.assign(IA01=['01']*len(df), IA02=['02']*len(df),IA09=['09']*len(df)),
{'IA': ['IA01', 'IA02','IA09'],
'Raw': ['IA01_Raw_baseline','IA02_Raw_midline','IA09_Raw_whatever'],
'Class1': ['IA01_Class1_baseline','IA02_Class1_midline','IA09_Class1_whatever'],
'Class2': ['IA01_Class2_baseline', 'IA02_Class2_midline','IA09_Class2_whatever']
})
edit: Just add column names
of which ever columns you want from the input in Raw/Class1/Class2
column of the output to the list inside the dictionary
documentation for this is not available . use help(pd.lreshape)
or refer here
Output:
Country Gender ID QA_Comments QA_Include Region Type IA Raw Class1 Class2
0 France Male SC1 NaN yes Europe A 01 4 8 1
1 France Female SC2 NaN yes Europe A 01 2 7 2
2 France Male SC3 NaN yes Europe B 01 3 7 2
3 France Male SC4 NaN yes Europe A 01 4 8 2
4 France Male SC5 NaN yes Europe B 01 1 7 1
5 France Male ID6 NaN yes Europe A 01 2 8 1
6 France Male ID7 NaN yes Europe B 01 2 8 1
7 France Male UC8 NaN yes Europe B 01 4 8 2
8 France Male SC1 NaN yes Europe A 02 J 4 1
9 France Female SC2 NaN yes Europe A 02 Q 6 4
10 France Male SC3 NaN yes Europe B 02 K 8 2
11 France Male SC4 NaN yes Europe A 02 A 2 1
12 France Male SC5 NaN yes Europe B 02 F 1 3
13 France Male ID6 NaN yes Europe A 02 R 3 7
14 France Male ID7 NaN yes Europe B 02 Q 4 6
15 France Male UC8 NaN yes Europe B 02 P 4 2
16 France Male SC1 NaN yes Europe A 09 W 6 3
17 France Female SC2 NaN yes Europe A 09 X 5 2
18 France Male SC3 NaN yes Europe B 09 Y 5 5
19 France Male SC4 NaN yes Europe A 09 P 5 2
20 France Male SC5 NaN yes Europe B 09 T 5 2
21 France Male ID6 NaN yes Europe A 09 I 5 2
22 France Male ID7 NaN yes Europe B 09 A 8 2
23 France Male UC8 NaN yes Europe B 09 K 7 5
Upvotes: 1
Reputation: 402813
This will get you started. The essence is using set_index
, column conversion to MultiIndex, then stack
. Better solutions exist, possibly, but I would do it this way because it is an easy step to your output.
# Set the index with columns that we don't want to "transpose"
df2 = df.set_index([
'ID', 'Country', 'Type', 'Region', 'Gender', 'QA_Include', 'QA_Comments'])
# Convert headers to MultiIndex -- this is so we can melt IA values
df2.columns = pd.MultiIndex.from_tuples(map(tuple, df2.columns.str.split('_')))
# Call stack to replicate data, then reset the index
out = df2.stack(level=0).reset_index().rename({'level_7': 'IA'}, axis=1)
out
ID Country Type Region Gender QA_Include QA_Comments IA Class1 Class2 Raw
0 SC1 France A Europe Male yes NaN IA01 8 1 4
1 SC1 France A Europe Male yes NaN IA02 4 1 J
2 SC2 France A Europe Female yes NaN IA01 7 2 2
3 SC2 France A Europe Female yes NaN IA02 6 4 Q
4 SC3 France B Europe Male yes NaN IA01 7 2 3
5 SC3 France B Europe Male yes NaN IA02 8 2 K
6 SC4 France A Europe Male yes NaN IA01 8 2 4
7 SC4 France A Europe Male yes NaN IA02 2 1 A
8 SC5 France B Europe Male yes NaN IA01 7 1 1
9 SC5 France B Europe Male yes NaN IA02 1 3 F
10 ID6 France A Europe Male yes NaN IA01 8 1 2
11 ID6 France A Europe Male yes NaN IA02 3 7 R
12 ID7 France B Europe Male yes NaN IA01 8 1 2
13 ID7 France B Europe Male yes NaN IA02 4 6 Q
14 UC8 France B Europe Male yes NaN IA01 8 2 4
15 UC8 France B Europe Male yes NaN IA02 4 2 P
Upvotes: 2