Reputation: 1017
I have 2 data frames, and i'd like to get the first data frame that contains data from the second data frame, based on the their index. The catch is that I do it iteratively and the columns index numbers of only the first df increase by one with each iteration, so it causes error.
example to that would be: First df after first iteration:
0
440 7.691
Second df after first iteration (doesn't change after each iteration):
1
0 M
1 M
2 M
3 M
4 M
.. ..
440 B
441 M
442 M
when i ran the code, I get the wanted df:
df_with_label = first_df.join(self.second_df)
0 1
440 7.691 B
After second iteration, my first df in now:
1
3 10.72
and when i run the same df_with_label = first_df.join(self.second_df)
i'd like to get:
1 2
3 10.72 M
But I get the error:
ValueError: columns overlap but no suffix specified: Int64Index([1], dtype='int64')
I'm guessing it has a problem with the fact that the index of the column of the first df is 1 after the second iteration, but don't know how to fix it. i'd like to keep the index of the first column to keep increasing.
The best solution would be to give the second column different name, so like:
1 class
3 10.72 M
Any idea how to fix it?
Upvotes: 0
Views: 51
Reputation: 1130
If I got it right your second dataframe doesn't change with iterations so why don't you just change its column name once and for all:
second_df.columns=['colname']
this should solve your naming conflicts.
Upvotes: 1
Reputation: 13387
Try:
df_with_label = first_df.join(self.second_df, rsuffix = "_2")
The thing is - df_with_label
and second_df
both have column 1
, so the rsuffix
will add "_2"
to the second_df column name "1" := "1_2"
. You join on indexes, so every other column is shown on default - so you need to avoid naming conflicts.
REF https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.join.html
Upvotes: 1