Reputation: 998
I hope you can give me a hand here. So, I have this dataframe:
| | a | b | a.1 | b.1 | a.2 | b.2|
|--:|-----:|-----:|-----:|-----:|-----:|------|
| 0 | a111 | b111 | c222 | d222 | e333 | f333 |
Make a subset is not an option because is too many columns.
The output I expect is like this:
| | a | b |
|---|------|------|
| 0 | a111 | b111 |
| 1 | c222 | d222 |
| 2 | e333 | f333 |
Thanks in advance.
The code to replicate the dataframe:
list_demo = []
a = "a111"
b = "b111"
c = "c222"
d = 'd222'
e = 'e333'
f = "f333"
list_demo.append([a,b,c,d,e,f])
df = pd.DataFrame(list_demo)
df.columns = ['a', 'b', 'a.1', 'b.1', 'a.2', 'b.2']
Upvotes: 1
Views: 65
Reputation: 28644
One option is with pivot_longer from pyjanitor:
# pip install pyjanitor
import pandas as pd
import janitor
df.pivot_longer(names_to = '.value', names_pattern = '(.).*')
a b
0 a111 b111
1 c222 d222
2 e333 f333
The .value
is a placeholder, which determines what sub parts of the column labels remain as header. The regex group in names_pattern
decides which parts are extracted.
Upvotes: 1
Reputation: 260640
You can use:
out = (df
.set_axis(df.columns.str.split('.', expand=True), axis=1)
.stack()
.droplevel(1)
)
Output:
a b
0 a111 b111
0 c222 d222
0 e333 f333
Or, if you have a single row:
(df.set_axis(df.columns.str.split('.', expand=True), axis=1)
.stack()
.reset_index(drop=True)
)
Output:
a b
0 a111 b111
1 c222 d222
2 e333 f333
Upvotes: 2