Jacques Tebeka
Jacques Tebeka

Reputation: 321

merge dataframe with the same columns name

Hi i have a dataframe that looks like that :

Unnamed: 0 X1 Unnamed: 1 X2 Unnamed: 1 X3 Unnamed: 2 X4
1970-01-31 5.0 1970-01-31 1.0 1970-01-31 1.0 1980-01-30 1.0
1970-02-26 6.0 1970-02-26 3.0 1970-02-26 3.0 1980-02-26 3.0

I have many columns (631) that looks like that.

I would like to have :

date X1 X2 X3 X4
1970-01-31 5.0 1.0 1.0 na
1970-02-26 6.0 3.0 3.0 na
1980-01-30 na na na 1.0
1980-02-26 na na na 3.0

I tried :

res_df = pd.concat(
    df2[[date, X]].rename(columns={date: "date"}) for date, X in zip(df2.columns[::2], 
    df2.columns[1::2])
    ).pivot_table(index="date")

It works for small data but do not work for mine. Maybe because I have the same columns name 'Unnamed: 1' in my df. I have a message error:

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Upvotes: 0

Views: 67

Answers (1)

jezrael
jezrael

Reputation: 863801

Crete index by date varible and use axis=1 in concat:

res_df = (pd.concat((df2[[date, X]].set_index(date) 
                     for date, X in zip(df2.columns[::2],  df2.columns[1::2])), axis=1)
            .rename_axis('date')
            .reset_index())
print (res_df)
         date   X1   X2   X3   X4
0  1970-01-31  5.0  1.0  1.0  NaN
1  1970-02-26  6.0  3.0  3.0  NaN
2  1980-01-30  NaN  NaN  NaN  1.0
3  1980-02-26  NaN  NaN  NaN  3.0

EDIT: Error seems like duplicated columns names in your DataFrame, possible solution is deduplicated before apply solution above:

df = pd.DataFrame(columns=['a','a','b'], index=[0])

#you can test if duplicated columns names
print (df.columns[df.columns.duplicated(keep=False)])
Index(['a', 'a'], dtype='object')

#https://stackoverflow.com/a/43792894/2901002
df.columns = pd.io.parsers.ParserBase({'names':df.columns})._maybe_dedup_names(df.columns)
print (df.columns)
Index(['a', 'a.1', 'b'], dtype='object')

Upvotes: 1

Related Questions