Reputation: 229
I'd like to transform the following data frame, but can't seem to get the right function to do it. Any time I use 'melt', I am prompted to convert to an array, though an array does not seem to accept column names.
In any case, any help with turning this:
A | B | C1 | C1.A | C2 | C2.A |
---|---|---|---|---|---|
PC | 11001 | Core | Old | SE | New |
Into this:
A | B | C | C.A |
---|---|---|---|
PC | 11001 | Core | Old |
PC | 11001 | SE | New |
Would be greatly appreciated.
Code for the first and second df's, respectively.
df1 = {'A': ['PC'], 'B': [11001],'C1':['Core'],'C1.A':['Old'],'C2':['SE'],'C2.A':['New']}
df1 = pd.DataFrame(data=df1)`
df2 = {'A': ['PC','PC'], 'B': [11001,11001],'C':['Core','SE'],'C.A':['Old','New']}
df2 = pd.DataFrame(data=df2)
Upvotes: 3
Views: 369
Reputation: 28644
In addition to the other solutions already proferred, pivot_longer from pyjanitor is another option -> your columns have a pattern that we could take advantage of (some end with numbers, others end with A):
# pip install pyjanitor
import janitor
import pandas as pd
df.pivot_longer(index=['A', 'B'],
names_to = ['C', 'C.A'],
names_pattern = ['.+\d$', '.+A$'])
A B C C.A
0 PC 11001 Core Old
1 PC 11001 SE New
the columns in names_to
become the new column names and are mapped to the old columns that match the regular expression in names_pattern
.
Upvotes: 3
Reputation: 833
thanks for your code typing. I will take it. :)
Here are some ideas:
groupy.apply
to generate a new two-row result (dataframe) from one-row data. groupby.apply
will take care of the combination with your new dataframes.first, we make a custom apply function for groupby:
def make_new_df_from_row(df):
row = df.iloc[0, :] # df is each df split by groupby
dd = pd.DataFrame({ #configure your hardcode
'A': row.A, 'B': row.B,
'C':[row.C1, row.C2],
'C.A':[row['C1.A'], row['C2.A']] #column name with '.' cannot use row.C1.A
})
return dd
then, use df2 = df1.groupby(lambda x:x, group_keys=False).apply(make_new_df_from_row)
to get result your want.
there is a screenshot from colab:
some notes:
group_keys=False
means don't return the key (grouped index)groupby.apply
accept a function which input is each groupby dataframe, you can return a pd.Series or pd.Dataframe depend on your need. Here we return the two-row dataframe from your code typing (with some variable change)Upvotes: 1
Reputation: 71689
Transform the columns to MultiIndex
by using replace
followed by split
, then stack
to reshape the frame to convert it into the desired format
df = df1.set_index(['A', 'B'])
df.columns = df.columns\
.str.replace(r'^(\D+)(\d+)(.*)', r'\1\3_\2')\
.str.split('_', expand=True)
df = df.stack().droplevel(2).reset_index()
A B C C.A
0 PC 11001 Core Old
1 PC 11001 SE New
Upvotes: 7
Reputation: 35626
With a little column reformatting, pd.wide_to_long
can be used:
df.columns = df.columns.str.replace(r'C(\d+).A', r'C.A\1', regex=True)
df = (
pd.wide_to_long(df, stubnames=['C', 'C.A'], i=['A', 'B'], j='drop')
.droplevel(axis=0, level='drop')
.reset_index()
)
df
:
A B C C.A
0 PC 11001 Core Old
1 PC 11001 SE New
Change the columns in df
from:
A B C1 C1.A C2 C2.A
0 PC 11001 Core Old SE New
To
df.columns = df.columns.str.replace(r'C(\d+).A', r'C.A\1', regex=True)
A B C1 C.A1 C2 C.A2
0 PC 11001 Core Old SE New
Then wide_to_long
on the stubnames C
and C.A
An alternative without affecting df
:
new_df = (
pd.wide_to_long(
df.rename(columns=dict(
zip(df.columns,
df.columns.str.replace(r'C(\d+).A', r'C.A\1', regex=True)))
),
stubnames=['C', 'C.A'],
i=['A', 'B'],
j='drop')
.droplevel(axis=0, level='drop')
.reset_index()
)
new_df
:
A B C C.A
0 PC 11001 Core Old
1 PC 11001 SE New
df
:
A B C1 C1.A C2 C2.A
0 PC 11001 Core Old SE New
Upvotes: 4