Reputation: 565
I have a dataframe that looks like this:
| A | B | C | D
0 | a | b | c | d | e | f | g | h
1 | a | b | c | d | e | f | NaN | NaN
2 | a | b | NaN | NaN | e | f | NaN | NaN
3 | a | b | NaN | NaN | NaN | NaN | NaN | NaN
and I want to change it to this:
OBJ VAL1 VAL2
0 A a b
1 A a b
2 A a b
3 A a b
4 A a b
5 A a b
6 B c d
7 B c d
8 C e f
9 C e f
10 C e f
11 D g h
so the multindex will convert to a column as well.
any help please?
and is there any good tutorial that explains exactly those kind of stuff so I will be able to do it and not just try and error?
Thank you
EDIT: My first originally dataframe looks like this:
A B C D
0 (a,b) (c,d) (e,f) (g,h)
1 (a,b) (c,d) (e,f) NaN
2 (a,b) NaN (e,f) NaN
3 (a,b) NaN NaN NaN
so in each cell there is a tuple.
Upvotes: 1
Views: 181
Reputation: 862611
You can use DataFrame.from_records
first, then double reset_index
and if necessary sort_values
for sorting all columns:
df = pd.DataFrame({"A": [('a','b'),('a','b'),('a','b'),('a','b')],
'B': [('c','d'),('c','d'), np.nan,np.nan],
'C':[('e','f'),('e','f'),('e','f'),np.nan],
'D':[('g','h'),np.nan,np.nan,np.nan]})
print (df)
A B C D
0 (a, b) (c, d) (e, f) (g, h)
1 (a, b) (c, d) (e, f) NaN
2 (a, b) NaN (e, f) NaN
3 (a, b) NaN NaN NaN
stacked = df.stack()
df1 = pd.DataFrame.from_records(stacked.tolist(), index = stacked.index)
.reset_index(level=0, drop=True)
.reset_index()
.sort_values(['index',0,1])
df1.columns = ['OBJ','VAL1','VAL2']
print (df1)
OBJ VAL1 VAL2
0 A a b
4 A a b
7 A a b
9 A a b
1 B c d
5 B c d
2 C e f
6 C e f
8 C e f
3 D g h
If your DataFrame
is with MultiIndex
in columns, need stack
first:
stacked = df.stack()
df1 = pd.DataFrame.from_records(stacked.tolist(), index = stacked.index) \
.unstack(1) \
.swaplevel(0, 1, 1) \
.sort_index(axis=1) \
.replace({None:np.nan})
print (df1)
A B C D
0 1 0 1 0 1 0 1
0 a b c d e f g h
1 a b c d e f NaN NaN
2 a b NaN NaN e f NaN NaN
3 a b NaN NaN NaN NaN NaN NaN
df2 = df1.stack(0)
.reset_index(level=0, drop=True)
.reset_index()
.sort_values(['index',0,1])
df2.columns = ['OBJ','VAL1','VAL2']
print (df2)
OBJ VAL1 VAL2
0 A a b
4 A a b
7 A a b
9 A a b
1 B c d
5 B c d
2 C e f
6 C e f
8 C e f
3 D g h
Upvotes: 2