Anony
Anony

Reputation: 109

move column names to first row in pandas frame

I forgot how to move the all the columns in dataframe to first of dataframe in A PANDAS WAY. -> I wish to add another function where it could also calculate how many columns there are and then assign the name for the new column names.

Example:

df = pd.DataFrame({'a': [1,2,3,4,5,6],
                  'b': [2,3,4,5,6,7],
                  'c': [2,3,4,5,6,7],
                  'd': [2,3,4,5,6,7],
                  'e': [2,3,4,5,6,7]})

Current output:

    a   b   c   d   e
0   1   2   2   2   2
1   2   3   3   3   3
2   3   4   4   4   4
3   4   5   5   5   5
4   5   6   6   6   6
5   6   7   7   7   7

Expected output:

    Q1.1    Q1.2    Q1.3    Q1.4    Q1.5
0   a   b   c   d   e
1   1   2   2   2   2
2   2   3   3   3   3
3   3   4   4   4   4
4   4   5   5   5   5
5   5   6   6   6   6
6   6   7   7   7   7

Please if possible, I wish to learn more pandas ways of processing frame so please use as much pandas ways as possible.

Upvotes: 1

Views: 9173

Answers (5)

Ch3steR
Ch3steR

Reputation: 20669

You can use np.vstack

# Use `df.to_numpy() instead of `df.values` mentioned in the docs.
new_df = pd.DataFrame(np.vstack([df.columns, df.to_numpy()]),
                      columns = [f'Q1.{i+1}' for i in range(df.shape[1])])

  Q1.1 Q1.2 Q1.3 Q1.4 Q1.5
0    a    b    c    d    e
1    1    2    2    2    2
2    2    3    3    3    3
3    3    4    4    4    4
4    4    5    5    5    5
5    5    6    6    6    6
6    6    7    7    7    7

Or

You can np.r_ here

             # np.r_[[df.columns], df.to_numpy()]
pd.DataFrame(np.r_['0,2', df.columns, df.to_numpy()], 
             columns = [f'Q1.{i+1}' for i in range(df.shape[1])])

  Q1.1 Q1.2 Q1.3 Q1.4 Q1.5
0    a    b    c    d    e
1    1    2    2    2    2
2    2    3    3    3    3
3    3    4    4    4    4
4    4    5    5    5    5
5    5    6    6    6    6
6    6    7    7    7    7

Or

Using np.concatenate

np.concatenate([[df.columns], df.values],axis=0)

You can use this if column names can start with Q1.0 and so on.

pd.DataFrame(np.vstack([df.columns, df.to_numpy()])).add_prefix('Q1.')

  Q1.0 Q1.1 Q1.2 Q1.3 Q1.4
0    a    b    c    d    e
1    1    2    2    2    2
2    2    3    3    3    3
3    3    4    4    4    4
4    4    5    5    5    5
5    5    6    6    6    6
6    6    7    7    7    7

Timeit results: The df given the question is used for benchmarking

# Ansev's answer
In [98]: %%timeit
    ...: (df.T.reset_index().T.reset_index(drop=True)
    ...:             .set_axis([f'Q1.{i+1}' for i in range(df.shape[1])], axis=1))
    ...:
1.93 ms ± 157 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

# My answer
In [99]: %%timeit
    ...: pd.DataFrame(np.vstack([df.columns, df.to_numpy()]),
    ...:                       columns = [f'Q1.{i+1}' for i in range(df.shape[1])])
    ...:
590 µs ± 43.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Benchmarking with large dataframe of shape (1,000,000, 5)

large_df = pd.DataFrame(np.random.randint(0,9,(1_000_000,5)),
                        columns = ['a', 'b', 'c', 'd', 'e'])
        a  b  c  d  e
0       3  8  0  8  5
1       7  4  0  0  7
2       5  1  2  6  1
3       8  0  5  5  6
4       0  2  3  1  8
...    .. .. .. .. ..
999995  1  7  3  8  7
999996  5  2  5  1  6
999997  7  4  4  3  5
999998  3  5  2  2  7
999999  6  7  0  8  0

[1000000 rows x 5 columns]

# My answer
In [105]: %%timeit
     ...: pd.DataFrame(np.vstack([large_df.columns, large_df.to_numpy()]),columns = [f'Q1.{i+1}' for i in range(large_d
     ...: f.shape[1])])
     ...:
     ...:
147 ms ± 16.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# Ansev's answer

In [107]: %%timeit
     ...: (large_df.T.reset_index().T.reset_index(drop=True)
     ...:             .set_axis([f'Q1.{i+1}' for i in range(large_df.shape[1])], axis=1))
     ...:
469 ms ± 3.52 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Upvotes: 3

ansev
ansev

Reputation: 30920

One line DataFrame.T + DataFrame.reset_index(). You can set the names of columns with DataFrame.set_axis()

new_df = (df.T.reset_index().T.reset_index(drop=True)
            .set_axis([f'Q1.{i+1}' for i in range(df.shape[1])], axis=1))
print(new_df)

Output

  Q1.1 Q1.2 Q1.3 Q1.4 Q1.5
0    a    b    c    d    e
1    1    2    2    2    2
2    2    3    3    3    3
3    3    4    4    4    4
4    4    5    5    5    5
5    5    6    6    6    6
6    6    7    7    7    7

Upvotes: 9

hiro protagonist
hiro protagonist

Reputation: 46849

this is a version:

import pandas as pd

df = pd.DataFrame({'a': [1, 2, 3, 4, 5, 6],
                   'b': [2, 3, 4, 5, 6, 7],
                   'c': [2, 3, 4, 5, 6, 7],
                   'd': [2, 3, 4, 5, 6, 7],
                   'e': [2, 3, 4, 5, 6, 7]})

df.loc[-1] = df.columns.values
df.sort_index(inplace=True)
df.reset_index(drop=True, inplace=True)

df.rename(columns=
    {"a": "Q1.1", "b": "Q1.2", "c": "Q1.3", "d": "Q1.4", "e": "Q1.5"}, 
    inplace=True)

where i first add a new (last) row df.loc[-1] then sort the index (df = df.sort_index()) to make it the row (which now has index -1), then i reset the index df.reset_index(drop=True, inplace=True) in order to make it start from 0 again.

it outputs:

  Q1.1 Q1.2 Q1.3 Q1.4 Q1.5
0    a    b    c    d    e
1    1    2    2    2    2
2    2    3    3    3    3
3    3    4    4    4    4
4    4    5    5    5    5
5    5    6    6    6    6
6    6    7    7    7    7

Upvotes: 4

Renaud
Renaud

Reputation: 2819

You can do by:

data={"A":[4,3,4],"B":[5,2,7],"C":[3,5,9],"D":[6,3,0]}

df=pd.DataFrame(data)

df.loc[-1]=df.columns
df.index = df.index + 1  # shifting index
df.sort_index(inplace=True)
df.columns=["Q1.1","Q1.2","Q1.3","Q1.4"]

Result:

  Q1.1 Q1.2 Q1.3 Q1.4
0    A    B    C    D
1    4    5    3    6
2    3    2    5    3
3    4    7    9    0

Upvotes: 1

ipj
ipj

Reputation: 3598

Try:

df = pd.DataFrame({'a': [1,2,3,4,5,6],
                  'b': [2,3,4,5,6,7],
                  'c': [2,3,4,5,6,7],
                  'd': [2,3,4,5,6,7],
                  'e': [2,3,4,5,6,7]})
df.loc[-1,:] = df.columns
df.index += 1
df.sort_index(inplace = True)
df.columns=['Q1.1','Q1.2','Q1.3','Q1.4','Q1.5']

result:

  Q1.1 Q1.2 Q1.3 Q1.4 Q1.5
0    a    b    c    d    e
1    1    2    2    2    2
2    2    3    3    3    3
3    3    4    4    4    4
4    4    5    5    5    5
5    5    6    6    6    6
6    6    7    7    7    7

Upvotes: 0

Related Questions