Reputation: 321
Hi i have a dataframe that looks like that :
Unnamed: 0 | X1 | Unnamed: 1 | X2 | Unnamed: 1 | X3 | Unnamed: 2 | X4 |
---|---|---|---|---|---|---|---|
1970-01-31 | 5.0 | 1970-01-31 | 1.0 | 1970-01-31 | 1.0 | 1980-01-30 | 1.0 |
1970-02-26 | 6.0 | 1970-02-26 | 3.0 | 1970-02-26 | 3.0 | 1980-02-26 | 3.0 |
I have many columns (631) that looks like that.
I would like to have :
date | X1 | X2 | X3 | X4 |
---|---|---|---|---|
1970-01-31 | 5.0 | 1.0 | 1.0 | na |
1970-02-26 | 6.0 | 3.0 | 3.0 | na |
1980-01-30 | na | na | na | 1.0 |
1980-02-26 | na | na | na | 3.0 |
I tried :
res_df = pd.concat(
df2[[date, X]].rename(columns={date: "date"}) for date, X in zip(df2.columns[::2],
df2.columns[1::2])
).pivot_table(index="date")
It works for small data but do not work for mine. Maybe because I have the same columns name 'Unnamed: 1' in my df. I have a message error:
InvalidIndexError: Reindexing only valid with uniquely valued Index objects
Upvotes: 0
Views: 67
Reputation: 863801
Crete index by date
varible and use axis=1
in concat
:
res_df = (pd.concat((df2[[date, X]].set_index(date)
for date, X in zip(df2.columns[::2], df2.columns[1::2])), axis=1)
.rename_axis('date')
.reset_index())
print (res_df)
date X1 X2 X3 X4
0 1970-01-31 5.0 1.0 1.0 NaN
1 1970-02-26 6.0 3.0 3.0 NaN
2 1980-01-30 NaN NaN NaN 1.0
3 1980-02-26 NaN NaN NaN 3.0
EDIT: Error seems like duplicated columns names in your DataFrame, possible solution is deduplicated before apply solution above:
df = pd.DataFrame(columns=['a','a','b'], index=[0])
#you can test if duplicated columns names
print (df.columns[df.columns.duplicated(keep=False)])
Index(['a', 'a'], dtype='object')
#https://stackoverflow.com/a/43792894/2901002
df.columns = pd.io.parsers.ParserBase({'names':df.columns})._maybe_dedup_names(df.columns)
print (df.columns)
Index(['a', 'a.1', 'b'], dtype='object')
Upvotes: 1