Reputation: 77
Im trying to work with an excell file using python and pandas. This file has a huge amount of columns and rows, but I will try to simplify using this example:
Name Age Nationality Name1 Age1 Nationality1 Name2 Age2 Nationality2
Jane 32 Canada
Pedro 25 Spain
Lucas 30 Italy
Ana 23 Germany
Pedro 43 Brazil
Lucas 32 Mexico
So, in this example, I have the columns: Name, Age and Nationality. But, I also have Name1, Age1, and Nationality1.. Since I want to filter it by its value, It wouldnt work because I would have to filter each one: Name, Name1 and Name2.
I tought that could be an option converting to different dictionaries and try to filter those dictionaries.. but considering the amount of columns and rows I guess it would take much longer.
I also tought if I coult rename the columns, but I searched and saw that it has to have unique names.. plese correct me if im wrong.
Does anyone have a solution for this? would be very helpful. thanks in advance
Upvotes: 2
Views: 126
Reputation: 16137
You can use bfill(axis=1)
to copy the first non null value in each row to every previous column. In the first iteration of the loop all of the Name column will be successfully populated. If you set that as the index then replace all other occurrences of those names in the df with NaN, you can repeat the process on the rest of the columns and end up with what you want.
import pandas as pd
import numpy as np
df = pd.read_csv('name_age_nationality.csv')
Name Age Nationality Name1 Age1 Nationality1 Name2 Age2 Nationality2
0 Jane 32.0 Canada NaN NaN NaN NaN NaN NaN
1 Pedro 25.0 Spain NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN Lucas 30.0 Italy NaN NaN NaN
3 NaN NaN NaN Ana 23.0 Germany NaN NaN NaN
4 NaN NaN NaN NaN NaN NaN Pedro 43.0 Brazil
5 NaN NaN NaN NaN NaN NaN Lucas 32.0 Mexico
for x in ['Name','Age','Nationality']:
df = df.bfill(axis=1).set_index(x)
df = df.replace(df.index.values,np.nan).reset_index()
df[['Name','Age','Nationality']]
Output
Name Age Nationality
0 Jane 32 Canada
1 Pedro 25 Spain
2 Lucas 30 Italy
3 Ana 23 Germany
4 Pedro 43 Brazil
5 Lucas 32 Mexico
Upvotes: 2
Reputation: 101
You can get all column header titles into a list. Can you be more specific what final result you want?
list(my_dataframe.columns.values)
Upvotes: 0