Reputation: 33

Database with pandas: adding new data

I have a lot of Excel plains and I load them using pandas, process the data and as an output it writes all data in a Excel plain that is my "database".

The Database has to follow a pattern in the date index, e.g. 2017-01-01 (yyyy-mm-dd), 2017-01-02, 2017-01-03 ... 2017-12-31 ... and so on.

But the plains that are my inputs do not follow a rule with the date. My processing deals with it and do the correctly match with the input plain and output database indexes creating a new file: pd.to_excel('database\databaseFinal.xlsx'). My problem is adding new values to the existing database and still process the indexes to respect the pattern.

for example:

DATABASE.xlsx:

    date         Name1  Name2
    2017-01-01   23.2   18.4
    2017-01-02   21.5   27.7
    2017-01-03   0      0
    2017-01-04   0      0

plain input to update the database:

    date         Name1  
    2017-01-04   32.5

process data... after merging data:

    date         Name1_x  Name2  Name1_y
    2017-01-01   23.2     18.4   0
    2017-01-02   21.5     27.7   0
    2017-01-03   0        0      0
    2017-01-04   0        0      32.5

What I want:

    date         Name1  Name2  
    2017-01-01   23.2   18.4  
    2017-01-02   21.5   27.7   
    2017-01-03   0      0      
    2017-01-04   32.5   0

In this problem I must have as output an excel file. I know that must be an easy and efficient way of dealing with this, but I dont want to my work was in vain

Upvotes: 1

Answers (2)

P.Tillmann

Reputation: 2110

Instead of using merge you can simple append and fill the NAN values with zero.

df1
         date  Name1  Name2
0  2017-01-01   23.2   18.4
1  2017-01-02   21.5   27.7
2  2017-01-03    0.0    0.0
3  2017-01-04    0.0    0.0
df2
         date  Name1
0  2017-01-04   32.5

df1.append(df2).fillna(0)
   Name1  Name2        date
0   23.2   18.4  2017-01-01
1   21.5   27.7  2017-01-02
2    0.0    0.0  2017-01-03
3    0.0    0.0  2017-01-04
0   32.5    0.0  2017-01-04

If you always want to keep the value from the second dataframe you can use drop_duplicate with date as subset:

df1.append(df2).fillna(0).drop_duplicates(subset=['date'], keep='last')
   Name1  Name2        date
0   23.2   18.4  2017-01-01
1   21.5   27.7  2017-01-02
2    0.0    0.0  2017-01-03
0   32.5    0.0  2017-01-04

Upvotes: 1

Sean

Reputation: 215

# Make the dataframe
df = pd.DataFrame([['2017-01-01', 23.2, 18.4],
['2017-01-02', 21.5, 27.7],
['2017-01-03', 0.0, 0.0],
['2017-01-04', 0.0, 0.0]]) 
df.columns = ["date","Name1","Name2"] 
df.index = df["date"] 
df = df.drop("date",axis=1)

# Change the value
df.loc["2017-01-04"]["Name1"] = 32.5

Upvotes: 1

Database with pandas: adding new data

Answers (2)

Related Questions