Ken Williams
Ken Williams

Reputation: 23975

Empty copy of Pandas DataFrame

I'm looking for an efficient idiom for creating a new Pandas DataFrame with the same columns and types as an existing DataFrame, but with no rows. The following works, but is presumably much less efficient than it could be, because it has to create a long indexing structure and then evaluate it for each row. I'm assuming that's O(n) in the number of rows, and I would like to find an O(1) solution (that's not too bad to look at).

out = df.loc[np.repeat(False, df.shape[0])].copy()

I have the copy() in there because I honestly have no idea under what circumstances I'm getting a copy or getting a view into the original.

For comparison in R, a nice idiom is to do df[0,], because there's no zeroth row. df[NULL,] also works.

Upvotes: 7

Views: 3498

Answers (4)

wwnde
wwnde

Reputation: 26676

Please deep copy original df and drop index.

#df1=(df.copy(deep=True)).drop(df.index)#If df is small
df1=df.drop(df.index).copy()#If df is large and dont want to copy and discard

Upvotes: 0

Ruthger Righart
Ruthger Righart

Reputation: 4921

Df1 the existing DataFrame:

df1 = pd.DataFrame({'x1':[1,2,3], 'x2':[4,5,6]})

Df2 the new, based on the columns in df1:

df2 = pd.DataFrame({}, columns=df1.columns)

For setting the dtypes of the different columns:

for x in df1.columns:
    df2[x]=df2[x].astype(df1[x].dtypes.name)

Upvotes: 3

Scott Boston
Scott Boston

Reputation: 153460

Update no rows

Use reindex:

dfcopy = pd.DataFrame().reindex(columns=df.columns)
print(dfcopy)

Output:

Empty DataFrame
Columns: [a, b, c, d, e]
Index: []

We can use reindex_like.

dfcopy = pd.DataFrame().reindex_like(df)

MCVE:

#Create dummy source dataframe
df = pd.DataFrame(np.arange(25).reshape(5,-1), index=[*'ABCDE'], columns=[*'abcde'])

dfcopy = pd.DataFrame().reindex_like(df)
print(dfcopy)

Output:

    a   b   c   d   e
A NaN NaN NaN NaN NaN
B NaN NaN NaN NaN NaN
C NaN NaN NaN NaN NaN
D NaN NaN NaN NaN NaN
E NaN NaN NaN NaN NaN

Upvotes: 1

Umar.H
Umar.H

Reputation: 23099

I think the equivalent in pandas would be slicing using iloc

df = pd.DataFrame({'A' : [0,1,2,3], 'B' : [4,5,6,7]})
print(df1)
   A  B
0  0  4
1  1  5
2  2  6
3  3  7

df1 = df.iloc[:0].copy()

print(df1)
Empty DataFrame
Columns: [A, B]
Index: []

Upvotes: 12

Related Questions