Reputation: 361
With my code I integrate 2 databases in 1. The problem is when I add one more column to my databases, the result is not as expected. Use Python 2.7
code:
import pandas as pd
import pandas.io.formats.excel
import numpy as np
# Leemos ambos archivos y los cargamos en DataFrames
df1 = pd.read_excel("archivo1.xlsx")
df2 = pd.read_excel("archivo2.xlsx")
df = (pd.concat([df1,df2])
.set_index(["Cliente",'Fecha'])
.stack()
.unstack(1)
.sort_index(ascending=(True, False)))
m = df.index.get_level_values(1) == 'Impresiones'
df.index = np.where(m, 'Impresiones', df.index.get_level_values(0))
# Creamos el xlsx de salida
pandas.io.formats.excel.header_style = None
with pd.ExcelWriter("Data.xlsx",
engine='xlsxwriter',
date_format='dd/mm/yyyy',
datetime_format='dd/mm/yyyy') as writer:
df.to_excel(writer, sheet_name='Sheet1')
archivo1:
Fecha Cliente Impresiones Impresiones 2 Revenue
20/12/17 Jose 1312 35 $12
20/12/17 Martin 12 56 $146
20/12/17 Pedro 5443 124 $1,256
20/12/17 Esteban 667 1235 $1
archivo2:
Fecha Cliente Impresiones Impresiones 2 Revenue
21/12/17 Jose 25 5 $2
21/12/17 Martin 6347 523 $123
21/12/17 Pedro 2368 898 $22
21/12/17 Esteban 235 99 $7,890
Hope Results:
I tried with m1 = df.index.get_level_values(1) == 'Impresiones 2' df.index = np.where(m1, 'Impresiones 2', df.index.get_level_values(0))
but I have this error: IndexError: Too many levels: Index has only 1 level, not 2
Upvotes: 3
Views: 3077
Reputation: 402852
The first bit of the solution is similar to jezrael's answer to your previous question, using concat
+ set_index
+ stack
+ unstack
+ sort_index
.
df = pd.concat([df1, df2])\
.set_index(['Cliente', 'Fecha'])\
.stack()\
.unstack(-2)\
.sort_index(ascending=[True, False])
Now comes the challenging part, we have to incorporate the Names in the 0th level, into the 1st level, and then reset the index.
I use np.insert
to insert names above the revenue entry in the index.
i, j = df.index.get_level_values(0), df.index.get_level_values(1)
k = np.insert(j.values, np.flatnonzero(j == 'Revenue'), i.unique())
Now, I create a new MultiIndex
which I then use to reindex
df
-
idx = pd.MultiIndex.from_arrays([i.unique().repeat(len(df.index.levels[1]) + 1), k])
df = df.reindex(idx).fillna('')
Now, drop the extra level -
df.index = df.index.droplevel()
df
Fecha 20/12/17 21/12/17
Esteban
Revenue $1 $7,890
Impresiones2 1235 99
Impresiones 667 235
Jose
Revenue $12 $2
Impresiones2 35 5
Impresiones 1312 25
Martin
Revenue $146 $123
Impresiones2 56 523
Impresiones 12 6347
Pedro
Revenue $1,256 $22
Impresiones2 124 898
Impresiones 5443 2368
Upvotes: 3