Timothee W
Timothee W

Reputation: 796

Alternative to .concat() of empty dataframe, now that it is being deprecated?

I have two dataframes that can both be empty, and I want to concat them.

Before I could just do :

output_df= pd.concat([df1, df2])

But now I run into

FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.

An easy fix would be:

if not df1.empty and not df2.empty:
    result_df = pd.concat([df1, df2], axis=0)
elif not df1.empty:
    result_df = df1.copy()
elif not df2.empty:
    result_df = df2.copy()
else:
    result_df = pd.DataFrame()

But that seems pretty ugly. Does anyone have a better solution ?

FYI: this appeared after pandas released v2.1.0

Upvotes: 45

Views: 60653

Answers (7)

MR42
MR42

Reputation: 51

You can try smth more universal:

def pd_concat(dfs:list, **kwargs):    
    D = [d for d in dfs if not d.empty]
    if D:  
        return pd.concat(D, **kwargs)    

    E = pd.DataFrame()    
    for d in dfs:
        cols = [c for c in d.columns if c not in E.columns]
        if cols:
            E[cols] = None
    return E  

Thus,

  1. no unexpected data type changes
  2. same syntax as pd.concat, e.g. pd_concat([df1, df2], axis=0)
  3. even when all DFs are empty pd_concat will return empty df with respect to columns as if they were concatenated; i.e. if you use result as an input to another operations/joins, it shall not cause any issue of missing columns from this empty df etc.

Upvotes: 0

user27009853
user27009853

Reputation:

same functionality as other answers

df_list = filter(lambda x: not x.empty, df_list)

new_df = pd.concat(list(df_list))

Upvotes: 0

Timeless
Timeless

Reputation: 37847

To be precise, concat is not deprecated (and won't be IMHO) but I can trigger this FutureWarning in 2.1.1 with the following example, while df2 being an empty DataFrame with a different dtypes than df1 :

df1 = pd.DataFrame({"A": [.1, .2, .3]})
df2 = pd.DataFrame(columns=["A"], dtype="object")

out = pd.concat([df1, df2])
print(out)

     A
0  0.1
1  0.2
2  0.3

As a solution in your case, you can try something like you did :

out = (df1.copy() if df2.empty else df2.copy() if df1.empty
       else pd.concat([df1, df2]) # if both DataFrames non empty
      )

Or maybe even this one? :

out = pd.concat([df1.astype(df2.dtypes), df2.astype(df1.dtypes)])

Upvotes: 26

Tuhin Mitra
Tuhin Mitra

Reputation: 727

A pretty simple solution to resolve this warning is:

Define a dataframe like this,

df = pd.DataFrame()

Instead of this,

df = pd.DataFrame(columns=['A', 'B', 'C'])
# or df = pd.DataFrame([], columns=['A', 'B', 'C'])

Then, you can concat on this dataframe with other dataframes you have.

df = pd.concat([df, df_other])

It'll work perfectly fine now!

Upvotes: 12

Victor23d
Victor23d

Reputation: 101

Try this if you know that there might be empty dataframe in the df_list

df_list = [df1, df2, ...]

df = pd.concat([df for df in df_list if not df.empty])

Upvotes: 10

valentinmk
valentinmk

Reputation: 1021

I found this solution based on @Timeless answer the most "non-ugly" for me.

In [1]: import pandas as pd

In [2]: df = pd.DataFrame([], columns=['A', 'B'])

In [3]: df = pd.concat([
   ...:     df if not df.empty else None,
   ...:     pd.DataFrame([{'A': 1.1, 'B': 2.2}])
   ...: ])

In [4]: df
Out[4]: 
     A    B
0  1.1  2.2

Upvotes: 15

Si Mon
Si Mon

Reputation: 499

What about this more generic solution?:

list_of_dfs = [df1, df2, dfx]
# now remove all columns from the dataframes which are empty or have all-NA 
cleaned_list_of_dfs = [df.dropna(axis=1, how='all') for df in list_of_dfs]
output_df = pd.concat(cleaned_list_of_dfs)

or with your example in one line:

output_df= pd.concat(df.dropna(axis=1, how='all') for df in [df1, df2])

That said, you might want to clean those columns out in a more explicit cleaning step and not necessarily during concatenation. Probably a user doesn't expect that some columns disappear during concatenation and that is why they have removed this behavior from future panda.

Upvotes: 7

Related Questions