Reputation: 531
I have a data frame that looks like below. The actual data frame has 64 columns.
0 1 2
app 2 tb 1 mt 3
app 0 tb 5 mt 2
app 0 tb 0 mt 6
I'd like to rename the columns using the substring (e.g. "app","tb"). The ideal data frame would look like below:
app tb mt
2 1 3
0 5 2
0 0 6
I know how to subset to the numeric values using str.split()
. However, how do I update the corresponding column using the first part of the string?
Upvotes: 1
Views: 1071
Reputation: 323306
Let us chain the function of stack
and unstack
out = df.stack().str.split(' ',expand=True).set_index(0,append=True)[1].reset_index(level=1,drop=True).unstack(level=-1)
0 app mt tb
0 2 3 1
1 0 2 5
2 0 6 0
Upvotes: 0
Reputation: 13417
To keep it as one fun method chaining solution:
new_df = (
df.set_axis(
df.loc[0, :].str.extract("^(.+)\s+", expand=False).tolist(), axis=1
)
.replace(regex="^(.+\s+)", value="")
)
print(new_df)
app tb mt
0 2 1 3
1 0 5 2
2 0 0 6
Upvotes: 0
Reputation: 175
A way to do this would be to use the .column method for a pandas dataframe.
Assuming that all your df values are consistent and you want the first part of that string as a column name for all your 64 columns, you can do this:
df.columns = [x.split()[0] for x in df.loc[0, :]]
df = df.apply(lambda x: x.str.replace(r"^(.*\s+)", ""))
Which essentially makes use of a list comprehension (a more pythonic loop) and a string split method in order to manipulate the first-row values in your df. Now, if you print df.head(), you show see:
app tb mt
0 2 1 3
1 0 5 2
2 0 0 6
Upvotes: 0
Reputation: 28699
You could reshape the data with melt
before pulling out the strings:
# flip the column names into rows
(df.melt(ignore_index = False)
.drop(columns = 'variable')
# split the column into strings and number
.loc[:, 'value'].str.split(expand=True)
# flip the dataframe to get the headers
.pivot(columns=0, values=1)
.rename_axis(columns = None)
)
app mt tb
0 2 3 1
1 0 2 5
2 0 6 0
A shorter route, with inspiration from @AndrejKesely, would be to use the string functions on the dataframe itself; this should be faster :
Get the columns:
df.columns = df.iloc[0].str.split().str[0]
Remove the column names from each column:
df.transform(lambda df: df.str.split().str[-1]).rename_axis(columns = None)
app tb mt
0 2 1 3
1 0 5 2
2 0 0 6
Upvotes: 0
Reputation: 195468
You can assign to .columns
to rename the columns of dataframe. For example:
df.columns = df.iloc[0, :].str.extract(r"^(.*)\s+")[0]
df = df.apply(lambda x: x.str.replace(r"^(.*\s+)", ""))
print(df)
Prints:
app tb mt
0 2 1 3
1 0 5 2
2 0 0 6
Upvotes: 2