Reputation: 403
I am using python 3.6.4 and pandas 0.23.0. I have referenced pandas 0.23.0 documentation for constructor and append. It does not mention anything about non-existent values. I didn't find any similar example.
Consider following code:
import pandas as pd
months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
index_yrs = [2016, 2017, 2018]
r2016 = [26, 27, 25, 22, 20, 23, 22, 20, 20, 18, 18, 19]
r2017 = [20, 21, 18, 16, 15, 15, 15, 15, 13, 13, 14, 15]
r2018 = [16, 18, 18, 18, 17]
df = pd.DataFrame([r2016], columns = months, index = [index_yrs[0]])
df = df.append(pd.DataFrame([r2017], columns = months, index = [index_yrs[1]]))
Now how to add r2018 which has data only till Month of May?
Upvotes: 1
Views: 81
Reputation: 164623
You can add a row using pd.DataFrame.loc
via a series. So you only need to convert your array into a pd.Series
object before adding a row:
df.loc[index_yrs[2]] = pd.Series(r2018, index=df.columns[:len(r2018)])
print(df)
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2016 26.0 27.0 25.0 22.0 20.0 23.0 22.0 20.0 20.0 18.0 18.0 19.0
2017 20.0 21.0 18.0 16.0 15.0 15.0 15.0 15.0 13.0 13.0 14.0 15.0
2018 16.0 18.0 18.0 18.0 17.0 NaN NaN NaN NaN NaN NaN NaN
However, I strongly recommend you form a list of lists (with padding) before a single append. This is because list.append
, or construction via a list comprehension, is cheap relative to repeated pd.DataFrame.append
or pd.DataFrame.loc
.
The above solution is recommended if you absolutely must add one row at a time.
Upvotes: 0
Reputation: 126
I agree with RafaelC that padding your list for 2018 data with NaNs for missing values is the best way to do this. You can use np.nan
from Numpy (which you will already have installed since you have Pandas) to generate NaNs.
import pandas as pd
import numpy as np
months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
index_yrs = [2016, 2017, 2018]
As a small change to your code I've put data for all three years into a years
list which we can pass as the data
parameter for pd.DataFrame. This eliminates the need to append each row to the previous ones.
r2016 = [26, 27, 25, 22, 20, 23, 22, 20, 20, 18, 18, 19]
r2017 = [20, 21, 18, 16, 15, 15, 15, 15, 13, 13, 14, 15]
r2018 = [16, 18, 18, 18, 17]
years = [r2016] + [r2017] + [r2018]
This is what years looks like: [[26, 27, 25, 22, 20, 23, 22, 20, 20, 18, 18, 19], [20, 21, 18, 16, 15, 15, 15, 15, 13, 13, 14, 15], [16, 18, 18, 18, 17]].
As for padding your year 2018 with NaNs something like this might do the trick. We are just ensuring that if a year only has values for the first n months that the remaining months will be filled out with NaNs.
for year in years:
if len(year) < 12:
year.extend([np.nan] * (12 - len(year)))
Finally we can create your dataframe using the one liner below instead of appending row by row.
df = pd.DataFrame(years, columns=months, index=index_yrs).astype(float)
Output:
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2016 26.0 27.0 25.0 22.0 20.0 23.0 22.0 20.0 20.0 18.0 18.0 19.0
2017 20.0 21.0 18.0 16.0 15.0 15.0 15.0 15.0 13.0 13.0 14.0 15.0
2018 16.0 18.0 18.0 18.0 17.0 NaN NaN NaN NaN NaN NaN NaN
You may notice that I converted the dtype of the values in the dataframe to float using .astype(float)
. I did this to make all of your columns as the same dtype. If we don't call .astype(float)
then Jan-May will be dtype int
and Jun-Dec will be dtype float64
.
Upvotes: 4