Reputation: 796
I have two dataframes that can both be empty, and I want to concat them.
Before I could just do :
output_df= pd.concat([df1, df2])
But now I run into
FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
An easy fix would be:
if not df1.empty and not df2.empty:
result_df = pd.concat([df1, df2], axis=0)
elif not df1.empty:
result_df = df1.copy()
elif not df2.empty:
result_df = df2.copy()
else:
result_df = pd.DataFrame()
But that seems pretty ugly. Does anyone have a better solution ?
FYI: this appeared after pandas released v2.1.0
Upvotes: 45
Views: 60653
Reputation: 51
You can try smth more universal:
def pd_concat(dfs:list, **kwargs):
D = [d for d in dfs if not d.empty]
if D:
return pd.concat(D, **kwargs)
E = pd.DataFrame()
for d in dfs:
cols = [c for c in d.columns if c not in E.columns]
if cols:
E[cols] = None
return E
Thus,
Upvotes: 0
Reputation:
same functionality as other answers
df_list = filter(lambda x: not x.empty, df_list)
new_df = pd.concat(list(df_list))
Upvotes: 0
Reputation: 37847
To be precise, concat
is not deprecated (and won't be IMHO) but I can trigger this FutureWarning
in 2.1.1 with the following example, while df2
being an empty DataFrame with a different dtypes
than df1
:
df1 = pd.DataFrame({"A": [.1, .2, .3]})
df2 = pd.DataFrame(columns=["A"], dtype="object")
out = pd.concat([df1, df2])
print(out)
A
0 0.1
1 0.2
2 0.3
As a solution in your case, you can try something like you did :
out = (df1.copy() if df2.empty else df2.copy() if df1.empty
else pd.concat([df1, df2]) # if both DataFrames non empty
)
Or maybe even this one? :
out = pd.concat([df1.astype(df2.dtypes), df2.astype(df1.dtypes)])
Upvotes: 26
Reputation: 727
A pretty simple solution to resolve this warning is:
Define a dataframe like this,
df = pd.DataFrame()
Instead of this,
df = pd.DataFrame(columns=['A', 'B', 'C'])
# or df = pd.DataFrame([], columns=['A', 'B', 'C'])
Then, you can concat
on this dataframe with other dataframe
s you have.
df = pd.concat([df, df_other])
It'll work perfectly fine now!
Upvotes: 12
Reputation: 101
Try this if you know that there might be empty dataframe in the df_list
df_list = [df1, df2, ...]
df = pd.concat([df for df in df_list if not df.empty])
Upvotes: 10
Reputation: 1021
I found this solution based on @Timeless answer the most "non-ugly" for me.
In [1]: import pandas as pd
In [2]: df = pd.DataFrame([], columns=['A', 'B'])
In [3]: df = pd.concat([
...: df if not df.empty else None,
...: pd.DataFrame([{'A': 1.1, 'B': 2.2}])
...: ])
In [4]: df
Out[4]:
A B
0 1.1 2.2
Upvotes: 15
Reputation: 499
What about this more generic solution?:
list_of_dfs = [df1, df2, dfx]
# now remove all columns from the dataframes which are empty or have all-NA
cleaned_list_of_dfs = [df.dropna(axis=1, how='all') for df in list_of_dfs]
output_df = pd.concat(cleaned_list_of_dfs)
or with your example in one line:
output_df= pd.concat(df.dropna(axis=1, how='all') for df in [df1, df2])
That said, you might want to clean those columns out in a more explicit cleaning step and not necessarily during concatenation. Probably a user doesn't expect that some columns disappear during concatenation and that is why they have removed this behavior from future panda.
Upvotes: 7