Subhajit Kundu
Subhajit Kundu

Reputation: 403

How to add rows with no values for some columns

I am using python 3.6.4 and pandas 0.23.0. I have referenced pandas 0.23.0 documentation for constructor and append. It does not mention anything about non-existent values. I didn't find any similar example.

Consider following code:

import pandas as pd

months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun",
          "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]

index_yrs = [2016, 2017, 2018]

r2016 = [26, 27, 25, 22, 20, 23, 22, 20, 20, 18, 18, 19]
r2017 = [20, 21, 18, 16, 15, 15, 15, 15, 13, 13, 14, 15]
r2018 = [16,  18,  18,  18,  17]

df = pd.DataFrame([r2016], columns = months, index = [index_yrs[0]])
df = df.append(pd.DataFrame([r2017], columns = months, index = [index_yrs[1]]))

Now how to add r2018 which has data only till Month of May?

Upvotes: 1

Views: 81

Answers (2)

jpp
jpp

Reputation: 164623

You can add a row using pd.DataFrame.loc via a series. So you only need to convert your array into a pd.Series object before adding a row:

df.loc[index_yrs[2]] = pd.Series(r2018, index=df.columns[:len(r2018)])

print(df)

       Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep   Oct   Nov   Dec
2016  26.0  27.0  25.0  22.0  20.0  23.0  22.0  20.0  20.0  18.0  18.0  19.0
2017  20.0  21.0  18.0  16.0  15.0  15.0  15.0  15.0  13.0  13.0  14.0  15.0
2018  16.0  18.0  18.0  18.0  17.0   NaN   NaN   NaN   NaN   NaN   NaN   NaN

However, I strongly recommend you form a list of lists (with padding) before a single append. This is because list.append, or construction via a list comprehension, is cheap relative to repeated pd.DataFrame.append or pd.DataFrame.loc.

The above solution is recommended if you absolutely must add one row at a time.

Upvotes: 0

L.P. Whigley
L.P. Whigley

Reputation: 126

I agree with RafaelC that padding your list for 2018 data with NaNs for missing values is the best way to do this. You can use np.nan from Numpy (which you will already have installed since you have Pandas) to generate NaNs.

import pandas as pd
import numpy as np

months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun",
          "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]

index_yrs = [2016, 2017, 2018]

As a small change to your code I've put data for all three years into a years list which we can pass as the data parameter for pd.DataFrame. This eliminates the need to append each row to the previous ones.

r2016 = [26, 27, 25, 22, 20, 23, 22, 20, 20, 18, 18, 19]
r2017 = [20, 21, 18, 16, 15, 15, 15, 15, 13, 13, 14, 15]
r2018 = [16,  18,  18,  18,  17]
years = [r2016] + [r2017] + [r2018]  

This is what years looks like: [[26, 27, 25, 22, 20, 23, 22, 20, 20, 18, 18, 19], [20, 21, 18, 16, 15, 15, 15, 15, 13, 13, 14, 15], [16, 18, 18, 18, 17]].

As for padding your year 2018 with NaNs something like this might do the trick. We are just ensuring that if a year only has values for the first n months that the remaining months will be filled out with NaNs.

for year in years:
    if len(year) < 12:
        year.extend([np.nan] * (12 - len(year)))

Finally we can create your dataframe using the one liner below instead of appending row by row.

df = pd.DataFrame(years, columns=months, index=index_yrs).astype(float)

Output:

      Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep   Oct   Nov   Dec
2016  26.0  27.0  25.0  22.0  20.0  23.0  22.0  20.0  20.0  18.0  18.0  19.0
2017  20.0  21.0  18.0  16.0  15.0  15.0  15.0  15.0  13.0  13.0  14.0  15.0
2018  16.0  18.0  18.0  18.0  17.0  NaN   NaN   NaN   NaN   NaN   NaN   NaN

You may notice that I converted the dtype of the values in the dataframe to float using .astype(float). I did this to make all of your columns as the same dtype. If we don't call .astype(float) then Jan-May will be dtype int and Jun-Dec will be dtype float64.

Upvotes: 4

Related Questions