Reputation: 109
I forgot how to move the all the columns in dataframe to first of dataframe in A PANDAS WAY. -> I wish to add another function where it could also calculate how many columns there are and then assign the name for the new column names.
Example:
df = pd.DataFrame({'a': [1,2,3,4,5,6],
'b': [2,3,4,5,6,7],
'c': [2,3,4,5,6,7],
'd': [2,3,4,5,6,7],
'e': [2,3,4,5,6,7]})
Current output:
a b c d e
0 1 2 2 2 2
1 2 3 3 3 3
2 3 4 4 4 4
3 4 5 5 5 5
4 5 6 6 6 6
5 6 7 7 7 7
Expected output:
Q1.1 Q1.2 Q1.3 Q1.4 Q1.5
0 a b c d e
1 1 2 2 2 2
2 2 3 3 3 3
3 3 4 4 4 4
4 4 5 5 5 5
5 5 6 6 6 6
6 6 7 7 7 7
Please if possible, I wish to learn more pandas ways of processing frame so please use as much pandas ways as possible.
Upvotes: 1
Views: 9173
Reputation: 20669
You can use np.vstack
# Use `df.to_numpy() instead of `df.values` mentioned in the docs.
new_df = pd.DataFrame(np.vstack([df.columns, df.to_numpy()]),
columns = [f'Q1.{i+1}' for i in range(df.shape[1])])
Q1.1 Q1.2 Q1.3 Q1.4 Q1.5
0 a b c d e
1 1 2 2 2 2
2 2 3 3 3 3
3 3 4 4 4 4
4 4 5 5 5 5
5 5 6 6 6 6
6 6 7 7 7 7
Or
You can np.r_
here
# np.r_[[df.columns], df.to_numpy()]
pd.DataFrame(np.r_['0,2', df.columns, df.to_numpy()],
columns = [f'Q1.{i+1}' for i in range(df.shape[1])])
Q1.1 Q1.2 Q1.3 Q1.4 Q1.5
0 a b c d e
1 1 2 2 2 2
2 2 3 3 3 3
3 3 4 4 4 4
4 4 5 5 5 5
5 5 6 6 6 6
6 6 7 7 7 7
Or
Using np.concatenate
np.concatenate([[df.columns], df.values],axis=0)
You can use this if column names can start with Q1.0
and so on.
pd.DataFrame(np.vstack([df.columns, df.to_numpy()])).add_prefix('Q1.')
Q1.0 Q1.1 Q1.2 Q1.3 Q1.4
0 a b c d e
1 1 2 2 2 2
2 2 3 3 3 3
3 3 4 4 4 4
4 4 5 5 5 5
5 5 6 6 6 6
6 6 7 7 7 7
Timeit results: The df given the question is used for benchmarking
# Ansev's answer
In [98]: %%timeit
...: (df.T.reset_index().T.reset_index(drop=True)
...: .set_axis([f'Q1.{i+1}' for i in range(df.shape[1])], axis=1))
...:
1.93 ms ± 157 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# My answer
In [99]: %%timeit
...: pd.DataFrame(np.vstack([df.columns, df.to_numpy()]),
...: columns = [f'Q1.{i+1}' for i in range(df.shape[1])])
...:
590 µs ± 43.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Benchmarking with large dataframe of shape (1,000,000, 5)
large_df = pd.DataFrame(np.random.randint(0,9,(1_000_000,5)),
columns = ['a', 'b', 'c', 'd', 'e'])
a b c d e
0 3 8 0 8 5
1 7 4 0 0 7
2 5 1 2 6 1
3 8 0 5 5 6
4 0 2 3 1 8
... .. .. .. .. ..
999995 1 7 3 8 7
999996 5 2 5 1 6
999997 7 4 4 3 5
999998 3 5 2 2 7
999999 6 7 0 8 0
[1000000 rows x 5 columns]
# My answer
In [105]: %%timeit
...: pd.DataFrame(np.vstack([large_df.columns, large_df.to_numpy()]),columns = [f'Q1.{i+1}' for i in range(large_d
...: f.shape[1])])
...:
...:
147 ms ± 16.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# Ansev's answer
In [107]: %%timeit
...: (large_df.T.reset_index().T.reset_index(drop=True)
...: .set_axis([f'Q1.{i+1}' for i in range(large_df.shape[1])], axis=1))
...:
469 ms ± 3.52 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Upvotes: 3
Reputation: 30920
One line DataFrame.T
+ DataFrame.reset_index()
. You can set the names of columns with DataFrame.set_axis()
new_df = (df.T.reset_index().T.reset_index(drop=True)
.set_axis([f'Q1.{i+1}' for i in range(df.shape[1])], axis=1))
print(new_df)
Output
Q1.1 Q1.2 Q1.3 Q1.4 Q1.5
0 a b c d e
1 1 2 2 2 2
2 2 3 3 3 3
3 3 4 4 4 4
4 4 5 5 5 5
5 5 6 6 6 6
6 6 7 7 7 7
Upvotes: 9
Reputation: 46849
this is a version:
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3, 4, 5, 6],
'b': [2, 3, 4, 5, 6, 7],
'c': [2, 3, 4, 5, 6, 7],
'd': [2, 3, 4, 5, 6, 7],
'e': [2, 3, 4, 5, 6, 7]})
df.loc[-1] = df.columns.values
df.sort_index(inplace=True)
df.reset_index(drop=True, inplace=True)
df.rename(columns=
{"a": "Q1.1", "b": "Q1.2", "c": "Q1.3", "d": "Q1.4", "e": "Q1.5"},
inplace=True)
where i first add a new (last) row df.loc[-1]
then sort the index (df = df.sort_index()
) to make it the row (which now has index -1
), then i reset the index df.reset_index(drop=True, inplace=True)
in order to make it start from 0
again.
it outputs:
Q1.1 Q1.2 Q1.3 Q1.4 Q1.5
0 a b c d e
1 1 2 2 2 2
2 2 3 3 3 3
3 3 4 4 4 4
4 4 5 5 5 5
5 5 6 6 6 6
6 6 7 7 7 7
Upvotes: 4
Reputation: 2819
You can do by:
data={"A":[4,3,4],"B":[5,2,7],"C":[3,5,9],"D":[6,3,0]}
df=pd.DataFrame(data)
df.loc[-1]=df.columns
df.index = df.index + 1 # shifting index
df.sort_index(inplace=True)
df.columns=["Q1.1","Q1.2","Q1.3","Q1.4"]
Result:
Q1.1 Q1.2 Q1.3 Q1.4
0 A B C D
1 4 5 3 6
2 3 2 5 3
3 4 7 9 0
Upvotes: 1
Reputation: 3598
Try:
df = pd.DataFrame({'a': [1,2,3,4,5,6],
'b': [2,3,4,5,6,7],
'c': [2,3,4,5,6,7],
'd': [2,3,4,5,6,7],
'e': [2,3,4,5,6,7]})
df.loc[-1,:] = df.columns
df.index += 1
df.sort_index(inplace = True)
df.columns=['Q1.1','Q1.2','Q1.3','Q1.4','Q1.5']
result:
Q1.1 Q1.2 Q1.3 Q1.4 Q1.5
0 a b c d e
1 1 2 2 2 2
2 2 3 3 3 3
3 3 4 4 4 4
4 4 5 5 5 5
5 5 6 6 6 6
6 6 7 7 7 7
Upvotes: 0