Reputation: 361
With my code I can join 2 databases in one. Now, I need to do the same with another database file.
archivo1:
Fecha Cliente Impresiones Impresiones 2 Revenue
20/12/17 Jose 1312 35 $12
20/12/17 Martin 12 56 $146
20/12/17 Pedro 5443 124 $1,256
20/12/17 Esteban 667 1235 $1
archivo2:
Fecha Cliente Impresiones Impresiones 2 Revenue
21/12/17 Jose 25 5 $2
21/12/17 Martin 6347 523 $123
21/12/17 Pedro 2368 898 $22
21/12/17 Esteban 235 99 $7,890
archivo:
Fecha Cliente Impresiones Impresiones 2 Revenue
22/12/17 Peter 55 5 $2
22/12/17 Juan 634527 523 $123
22/12/17 Pedro 836 898 $22
22/12/17 Esteban 125 99 $7,890
I have this results:
The problem is that I need to add the new database(archivo
) into the Data.xlsx
file and it will look like:
Code:
import pandas as pd
import pandas.io.formats.excel
import numpy as np
# Leemos ambos archivos y los cargamos en DataFrames
df1 = pd.read_excel("archivo1.xlsx")
df2 = pd.read_excel("archivo2.xlsx")
df = pd.concat([df1, df2])\
.set_index(['Cliente', 'Fecha'])\
.stack()\
.unstack(-2)\
.sort_index(ascending=[True, False])
i, j = df.index.get_level_values(0), df.index.get_level_values(1)
k = np.insert(j.values, np.flatnonzero(j == 'Revenue'), i.unique())
idx = pd.MultiIndex.from_arrays([i.unique().repeat(len(df.index.levels[1]) + 1), k])
df = df.reindex(idx).fillna('')
df.index = df.index.droplevel()
# Creamos el xlsx de salida
pandas.io.formats.excel.header_style = None
with pd.ExcelWriter("Data.xlsx",
engine='xlsxwriter',
date_format='dd/mm/yyyy',
datetime_format='dd/mm/yyyy') as writer:
df.to_excel(writer, sheet_name='Sheet1')
Upvotes: 1
Views: 45
Reputation: 403050
Extending my comment as an answer, I'd recommend creating a function that will reshape your dataframes to conform to a given format. I'd recommend doing this simply because it is much easier to just reshape your data, rather than reshape new entries to conform to the existing structure. This is because your current structure is a format that makes it extremely hard to work with (take it from me).
So, the easiest thing to do would be to create a function -
def process(dfs):
df = pd.concat(dfs)\
.set_index(['Cliente', 'Fecha'])\
.stack()\
.unstack(-2)\
.sort_index(ascending=[True, False])
i = df.index.get_level_values(0)
j = df.index.get_level_values(1)
y = np.insert(j.values, np.flatnonzero(j == 'Revenue'), i.unique())
x = i.unique().repeat(len(df.index.levels[1]) + 1)
df = df.reindex(pd.MultiIndex.from_arrays([x, y])).fillna('')
df.index = df.index.droplevel()
return df
Now, load your dataframes -
df_list = []
for file in ['archivo1.xlsx', 'archivo2.xlsx', ...]:
df_list.append(pd.read_excel(file))
Now, call the process
function with your df_list
-
df = process(df_list)
df
Fecha 20/12/17 21/12/17
Esteban
Revenue $1 $7,890
Impresiones2 1235 99
Impresiones 667 235
Jose
Revenue $12 $2
Impresiones2 35 5
Impresiones 1312 25
Martin
Revenue $146 $123
Impresiones2 56 523
Impresiones 12 6347
Pedro
Revenue $1,256 $22
Impresiones2 124 898
Impresiones 5443 2368
Save df
to a new excel file. Repeat the process for every new dataframe that enters the system.
In summary, your entire code listing would look like this -
import pandas as pd
import pandas.io.formats.excel
import numpy as np
def process(dfs):
df = pd.concat(dfs)\
.set_index(['Cliente', 'Fecha'])\
.stack()\
.unstack(-2)\
.sort_index(ascending=[True, False])
i = df.index.get_level_values(0)
j = df.index.get_level_values(1)
y = np.insert(j.values, np.flatnonzero(j == 'Revenue'), i.unique())
x = i.unique().repeat(len(df.index.levels[1]) + 1)
df = df.reindex(pd.MultiIndex.from_arrays([x, y])).fillna('')
df.index = df.index.droplevel()
return df
if __name__ == '__main__':
df_list = []
for file in ['archivo1.xlsx', 'archivo2.xlsx']:
df_list.append(pd.read_excel(file))
df = process(df_list)
with pd.ExcelWriter("test.xlsx",
engine='xlsxwriter',
date_format='dd/mm/yyyy',
datetime_format='dd/mm/yyyy') as writer:
df.to_excel(writer, sheet_name='Sheet1')
The alternative to this tedious process is to change your dataset structure, and reconsider a more viable alternative that makes it much easier to add new data to existing data without having to keep reshaping everything from scratch. This is something you'll have to sit down and think about.
Upvotes: 3