Reputation: 25
scenario: two data frames:
result_datapd: columns A B C D E F G H
result_datapd2: columns A C E G H
I'm trying to insert two columns in result_datapd2:
result_datapd2: columns A B C D E F G H
This is my code:
for n in range(len(result_datapd.columns.difference(result_datapd2.columns))):
name_column = result_datapd.columns.difference(result_datapd2.columns)[n]
loc_column = result_datapd.columns.get_loc(name_column)
print(name_column)
print(loc_column)
result_datapd2.insert(loc=loc_column, column=name_column, value = '')
When I run, I recevied this error:
IndexError: index 2 is out of bounds for axis 0 with size 1
And dataframe result is something like this
result_datapd2: columns A B C E D G H
Upvotes: 2
Views: 71
Reputation: 4929
You're changing inplace result_datapd2
(adding columns) so that it messes up the loop. If you want to have the same column order than result_datapd
, you can add the columns normally, then order them accordingly:
# Data
result_datapd = pd.DataFrame(columns=list('ABCDEFGH'))
result_datapd2 = pd.DataFrame(columns=list('ACEGH'))
# Get distinct columns
distint_cols = result_datapd.columns.difference(result_datapd2.columns)
# Add new columns
result_datapd2[distint_cols] = result_datapd[distint_cols]
# Order columns like result_datapd
result_datapd2[result_datapd.columns]
Upvotes: 2
Reputation: 64
Inside your for-loop you modify result_datapd2 by adding a column each loop. The difference is thus mutated at each iteration as columns get added. That's why the index is out of bounds. If you make a copy of the original dataframe this will no longer be an issue. However, a more space-efficient way might be to just copy the column names into a list beforehand.
result_datapd = pd.DataFrame(columns=['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'])
result_datapd2 = pd.DataFrame(columns=['A', 'C', 'E', 'G', 'H'])
result_datapd2_copy = result_datapd2.copy()
for n in range(len(result_datapd.columns.difference(result_datapd2_copy.columns))):
name_column = result_datapd.columns.difference(result_datapd2_copy.columns)[n]
loc_column = result_datapd.columns.get_loc(name_column)
print(name_column)
print(loc_column)
result_datapd2.insert(loc=loc_column, column=name_column, value = '')
Upvotes: 1