user2242044
user2242044

Reputation: 9253

Drop all data in a pandas dataframe

I would like to drop all data in a pandas dataframe, but am getting TypeError: drop() takes at least 2 arguments (3 given). I essentially want a blank dataframe with just my columns headers.

import pandas as pd

web_stats = {'Day': [1, 2, 3, 4, 2, 6],
             'Visitors': [43, 43, 34, 23, 43, 23],
             'Bounce_Rate': [3, 2, 4, 3, 5, 5]}
df = pd.DataFrame(web_stats)

df.drop(axis=0, inplace=True)
print df

Upvotes: 92

Views: 245707

Answers (8)

Paras
Paras

Reputation: 17

You can overwrite using a blank dataframe while keeping the original column names.

df = pd.DataFrame(data=None, columns=df.columns)

Upvotes: 0

Dr. Venkata Goli
Dr. Venkata Goli

Reputation: 1

If you want to removes all data and columns and reassigns the dataframe to an empty frame:

myDf=pd.DataFrame(None) #does the trick, 

if you want to keep column names:

myDf.iloc[0:0]

Upvotes: 0

Matt
Matt

Reputation: 169

If your goal is to drop the dataframe, then you need to pass all columns. For me: the best way is to pass a list comprehension to the columns kwarg. This will then work regardless of the different columns in a df.

import pandas as pd

web_stats = {'Day': [1, 2, 3, 4, 2, 6],
             'Visitors': [43, 43, 34, 23, 43, 23],
             'Bounce_Rate': [3, 2, 4, 3, 5, 5]}
df = pd.DataFrame(web_stats)

df.drop(columns=[i for i in check_df.columns])

Upvotes: 0

Sergey Svetloff
Sergey Svetloff

Reputation: 1

This code make clean dataframe:

df = pd.DataFrame({'a':[1,2], 'b':[3,4]})
#clean
df = pd.DataFrame()

Upvotes: -4

Zisis F
Zisis F

Reputation: 362

Overwrite the dataframe with something like that

import pandas as pd

df = pd.DataFrame(None)

or if you want to keep columns in place

df = pd.DataFrame(columns=df.columns)

Upvotes: 12

Raul Menendez
Raul Menendez

Reputation: 161

My favorite way is:

df = df[0:0] 

Upvotes: 16

user2285236
user2285236

Reputation:

You need to pass the labels to be dropped.

df.drop(df.index, inplace=True)

By default, it operates on axis=0.

You can achieve the same with

df.iloc[0:0]

which is much more efficient.

Upvotes: 169

tomatom
tomatom

Reputation: 469

My favorite:

df = df.iloc[0:0]

But be aware df.index.max() will be nan. To add items I use:

df.loc[0 if math.isnan(df.index.max()) else df.index.max() + 1] = data

Upvotes: 25

Related Questions