Alicia_2024
Alicia_2024

Reputation: 531

Rename columns using part of the string in Pandas

I have a data frame that looks like below. The actual data frame has 64 columns.

  0      1      2 
app 2  tb 1   mt 3
app 0  tb 5   mt 2
app 0  tb 0   mt 6

I'd like to rename the columns using the substring (e.g. "app","tb"). The ideal data frame would look like below:

app tb mt 
2   1   3
0   5   2
0   0   6

I know how to subset to the numeric values using str.split(). However, how do I update the corresponding column using the first part of the string?

Upvotes: 1

Views: 1071

Answers (5)

BENY
BENY

Reputation: 323306

Let us chain the function of stack and unstack

out = df.stack().str.split(' ',expand=True).set_index(0,append=True)[1].reset_index(level=1,drop=True).unstack(level=-1)
0 app mt tb
0   2  3  1
1   0  2  5
2   0  6  0

Upvotes: 0

Cameron Riddell
Cameron Riddell

Reputation: 13417

To keep it as one fun method chaining solution:

new_df = (
    df.set_axis(
        df.loc[0, :].str.extract("^(.+)\s+", expand=False).tolist(), axis=1
    )
    .replace(regex="^(.+\s+)", value="")
)

print(new_df)
  app tb mt
0   2  1  3
1   0  5  2
2   0  0  6

Upvotes: 0

jtsw1990
jtsw1990

Reputation: 175

A way to do this would be to use the .column method for a pandas dataframe.

Assuming that all your df values are consistent and you want the first part of that string as a column name for all your 64 columns, you can do this:

df.columns = [x.split()[0] for x in df.loc[0, :]]
df = df.apply(lambda x: x.str.replace(r"^(.*\s+)", ""))

Which essentially makes use of a list comprehension (a more pythonic loop) and a string split method in order to manipulate the first-row values in your df. Now, if you print df.head(), you show see:

    app     tb      mt
0   2       1       3
1   0       5       2
2   0       0       6

Upvotes: 0

sammywemmy
sammywemmy

Reputation: 28699

You could reshape the data with melt before pulling out the strings:

   # flip the column names into rows
  (df.melt(ignore_index = False)
    .drop(columns = 'variable')
    # split the column into strings and number
    .loc[:, 'value'].str.split(expand=True)
    # flip the dataframe to get the headers
    .pivot(columns=0, values=1)
    .rename_axis(columns = None)
 )

  app mt tb
0   2  3  1
1   0  2  5
2   0  6  0

A shorter route, with inspiration from @AndrejKesely, would be to use the string functions on the dataframe itself; this should be faster :

Get the columns:

df.columns = df.iloc[0].str.split().str[0]

Remove the column names from each column:

df.transform(lambda df: df.str.split().str[-1]).rename_axis(columns = None)

  app tb mt
0   2  1  3
1   0  5  2
2   0  0  6

Upvotes: 0

Andrej Kesely
Andrej Kesely

Reputation: 195468

You can assign to .columns to rename the columns of dataframe. For example:

df.columns = df.iloc[0, :].str.extract(r"^(.*)\s+")[0]
df = df.apply(lambda x: x.str.replace(r"^(.*\s+)", ""))

print(df)

Prints:

  app tb mt
0   2  1  3
1   0  5  2
2   0  0  6

Upvotes: 2

Related Questions