Reputation: 2056
I am trying to append an empty row at the end of dataframe but unable to do so, even trying to understand how pandas work with append function and still not getting it.
Here's the code:
import pandas as pd
excel_names = ["ARMANI+EMPORIO+AR0143-book.xlsx"]
excels = [pd.ExcelFile(name) for name in excel_names]
frames = [x.parse(x.sheet_names[0], header=None,index_col=None).dropna(how='all') for x in excels]
for f in frames:
f.append(0, float('NaN'))
f.append(2, float('NaN'))
There are two columns and random number of row.
with "print f" in for loop i Get this:
0 1
0 Brand Name Emporio Armani
2 Model number AR0143
4 Part Number AR0143
6 Item Shape Rectangular
8 Dial Window Material Type Mineral
10 Display Type Analogue
12 Clasp Type Buckle
14 Case Material Stainless steel
16 Case Diameter 31 millimetres
18 Band Material Leather
20 Band Length Women's Standard
22 Band Colour Black
24 Dial Colour Black
26 Special Features second-hand
28 Movement Quartz
Upvotes: 56
Views: 147247
Reputation: 11
@Dave Reikher's answer is the best solution.
df.loc[df.iloc[-1].name + 1,:] = np.nan
Here's a similar answer without the NumPy
library
df.loc[len(df.index)] = ['' for x in df.columns.values.tolist()]
len(df.index)
= number of rows. Always 1 more than index count.df.loc[len(df.index)]
you are selecting the next index number (row) available.df.iloc[-1].name + 1
equals df.loc[len(df.index)]
df.columns.values.tolist()
['' for x in df.columns.values.tolist()]
Upvotes: 0
Reputation: 2361
Append "empty" row to data frame and fill selected cells:
Generate empty data frame (no rows just columns a
and b
):
import pandas as pd
col_names = ["a","b"]
df = pd.DataFrame(columns = col_names)
Append empty row at the end of the data frame:
df = df.append(pd.Series(), ignore_index = True)
Now fill the empty cell at the end (len(df)-1
) of the data frame in column a
:
df.loc[[len(df)-1],'a'] = 123
Result:
a b
0 123 NaN
And of course one can iterate over the rows and fill cells:
col_names = ["a","b"]
df = pd.DataFrame(columns = col_names)
for x in range(0,5):
df = df.append(pd.Series(), ignore_index = True)
df.loc[[len(df)-1],'a'] = 123
Result:
a b
0 123 NaN
1 123 NaN
2 123 NaN
3 123 NaN
4 123 NaN
Upvotes: 5
Reputation: 1
You can also use:
your_dataframe.insert(loc=0, value=np.nan, column="")
where loc
is your empty row index.
Upvotes: 0
Reputation: 1954
Assuming df
is your dataframe,
df_prime = pd.concat([df, pd.DataFrame([[np.nan] * df.shape[1]], columns=df.columns)], ignore_index=True)
where df_prime
equals df
with an additional last row of NaN's.
Note that pd.concat
is slow so if you need this functionality in a loop, it's best to avoid using it.
In that case, assuming your index is incremental, you can use
df.loc[df.iloc[-1].name + 1,:] = np.nan
Upvotes: 5
Reputation: 1098
Add a new pandas.Series using pandas.DataFrame.append().
If you wish to specify the name (AKA the "index") of the new row, use:
df.append(pandas.Series(name='NameOfNewRow'))
If you don't wish to name the new row, use:
df.append(pandas.Series(), ignore_index=True)
where df
is your pandas.DataFrame.
Upvotes: 74
Reputation: 160
Assuming your df.index is sorted you can use:
df.loc[df.index.max() + 1] = None
It handles well different indexes and column types.
[EDIT] it works with pd.DatetimeIndex if there is a constant frequency, otherwise we must specify the new index exactly e.g:
df.loc[df.index.max() + pd.Timedelta(milliseconds=1)] = None
long example:
df = pd.DataFrame([[pd.Timestamp(12432423), 23, 'text_field']],
columns=["timestamp", "speed", "text"],
index=pd.DatetimeIndex(start='2111-11-11',freq='ms', periods=1))
df.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1 entries, 2111-11-11 to 2111-11-11
Freq: L
Data columns (total 3 columns):
timestamp 1 non-null datetime64[ns]
speed 1 non-null int64
text 1 non-null object
dtypes: datetime64[ns](1), int64(1), object(1)
memory usage: 32.0+ bytes
df.loc[df.index.max() + 1] = None
df.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2 entries, 2111-11-11 00:00:00 to 2111-11-11 00:00:00.001000
Data columns (total 3 columns):
timestamp 1 non-null datetime64[ns]
speed 1 non-null float64
text 1 non-null object
dtypes: datetime64[ns](1), float64(1), object(1)
memory usage: 64.0+ bytes
df.head()
timestamp speed text
2111-11-11 00:00:00.000 1970-01-01 00:00:00.012432423 23.0 text_field
2111-11-11 00:00:00.001 NaT NaN NaN
Upvotes: 1
Reputation: 355
You can add a new series, and name it at the same time. The name will be the index of the new row, and all the values will automatically be NaN.
df.append(pd.Series(name='Afterthought'))
Upvotes: 5
Reputation: 197
The code below worked for me.
df.append(pd.Series([np.nan]), ignore_index = True)
Upvotes: 3
Reputation: 1616
You can add it by appending a Series to the dataframe as follows. I am assuming by blank you mean you want to add a row containing only "Nan". You can first create a Series object with Nan. Make sure you specify the columns while defining 'Series' object in the -Index parameter. The you can append it to the DF. Hope it helps!
from numpy import nan as Nan
import pandas as pd
>>> df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
... 'B': ['B0', 'B1', 'B2', 'B3'],
... 'C': ['C0', 'C1', 'C2', 'C3'],
... 'D': ['D0', 'D1', 'D2', 'D3']},
... index=[0, 1, 2, 3])
>>> s2 = pd.Series([Nan,Nan,Nan,Nan], index=['A', 'B', 'C', 'D'])
>>> result = df1.append(s2)
>>> result
A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3
4 NaN NaN NaN NaN
Upvotes: 20