Reputation: 615
Let's say I have a df
such as this:
df = pd.DataFrame({'A': [1,2,3,4,5], 'A_z': [2,3,4,5,6], 'B': [3,4,5,6,7], 'B_z': [4,5,6,7,8],
'C': [5,6,7,8,9], 'C_z': [6,7,8,9,10]})
Which looks like this:
A A_z B B_z C C_z
0 1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
What I'm looking to do is create a new df
and for each letter (A,B,C) append this new df
vertically with the data from the two columns per letter so that it looks like this:
Letter Letter_z
0 1 2
1 2 3
2 3 4
3 4 5
4 5 6
5 3 4
6 4 5
7 5 6
8 6 7
9 7 8
10 5 6
11 6 7
12 7 8
13 8 9
14 9 10
As far as I'm concerned something like this should work fine:
for col in df.columns:
if col[-1] != 'z':
new_df = new_df.append(df[[col, col + '_z']])
However this results in the following mess:
A A_z B B_z C C_z
0 1.0 2.0 NaN NaN NaN NaN
1 2.0 3.0 NaN NaN NaN NaN
2 3.0 4.0 NaN NaN NaN NaN
3 4.0 5.0 NaN NaN NaN NaN
4 5.0 6.0 NaN NaN NaN NaN
0 NaN NaN 3.0 4.0 NaN NaN
1 NaN NaN 4.0 5.0 NaN NaN
2 NaN NaN 5.0 6.0 NaN NaN
3 NaN NaN 6.0 7.0 NaN NaN
4 NaN NaN 7.0 8.0 NaN NaN
0 NaN NaN NaN NaN 5.0 6.0
1 NaN NaN NaN NaN 6.0 7.0
2 NaN NaN NaN NaN 7.0 8.0
3 NaN NaN NaN NaN 8.0 9.0
4 NaN NaN NaN NaN 9.0 10.0
What am I doing wrong? Any help would be really appreciated, cheers.
EDIT:
After the kind help from jezrael the renaming of the columns in his answer got me thinking about a possible way to do it using my original train of thought.
I can now also achieve the new df
I want using the following:
for col in df:
if col[-1] != 'z':
d = df[[col, col + '_z']]
d.columns = ['Letter', 'Letter_z']
new_df = new_df.append(d)
The different columns names were clearly what was causing the problem which is something I wasn't aware of at the time. Hope this helps anyone.
Upvotes: 2
Views: 219
Reputation: 862511
One ide is use Series.str.split
with expand=True
for MultiIndex
, then use rename
for avoid NaN
s and finally new columns names, reshape by DataFrame.stack
, sort for correct order by DataFrame.sort_index
and last remove MultiIndex
:
df.columns = df.columns.str.split('_', expand=True)
df = df.rename(columns=lambda x:'Letter_z' if x == 'z' else 'Letter', level=1)
df = df.stack(0).sort_index(level=[1,0]).reset_index(drop=True)
print (df)
Letter Letter_z
0 1 2
1 2 3
2 3 4
3 4 5
4 5 6
5 3 4
6 4 5
7 5 6
8 6 7
9 7 8
10 5 6
11 6 7
12 7 8
13 8 9
14 9 10
Or if possible simplify problem with reshape all non z
values to one column and all z
values to another use numpy.ravel
:
m = df.columns.str.endswith('_z')
a = df.loc[:, ~m].to_numpy().T.ravel()
b = df.loc[:, m].to_numpy().T.ravel()
df = pd.DataFrame({'Letter': a,'Letter_z': b})
print (df)
Letter Letter_z
0 1 2
1 2 3
2 3 4
3 4 5
4 5 6
5 3 4
6 4 5
7 5 6
8 6 7
9 7 8
10 5 6
11 6 7
12 7 8
13 8 9
14 9 10
Upvotes: 3
Reputation: 17794
You can use the function concat
and a list comprehension:
cols = df.columns[~df.columns.str.endswith('_z')]
func = lambda x: 'letter_z' if x.endswith('_z') else 'letter'
pd.concat([df.filter(like=i).rename(func, axis=1) for i in cols])
or
cols = df.columns[~df.columns.str.endswith('_z')]
pd.concat([df.filter(like=i).set_axis(['letter', 'letter_z'], axis=1, inplace=False) for i in cols])
Upvotes: 2