nipy
nipy

Reputation: 5498

Drop duplicate Pandas dataframes from a list based on a subset of columns

I have a list of dataframes named tempDFList.

type(tempDFList)
list

type(tempDFList[0])
pandas.core.frame.DataFrame

They have a subset of columns in common including Previous Pivot Price & Pivot Price but not all columns are the same.

How do I use something like drop_duplicates using default keep='first to ensure I don't have any frames with the same Previous Pivot Price & Pivot Price in the tempDFList?

Desired output is a list of dataframes that doesn't have any frames with the same Previous Pivot Price & Pivot Price. For the sample data below only two would remain.

Code from df.to_dict

Each is a separate dataframe with one row. These are in tempDFList.

{'Re_236_H1': {0: nan},
 'Re_382_H1': {0: nan},
 'Re_50_H1': {0: 0.8677},
 'Re_618_H1': {0: 0.8668},
 'Previous Pivot Date': {0: '2021-04-13 09:00:00'},
 'Previous Pivot Price': {0: 0.86408},
 'Date': {0: Timestamp('2021-04-13 13:00:00')},
 'Pivot Price': {0: 0.871180},
 'Pivot Length': {0: 0.007099}}


{'Re_236_M15': {0: nan},
 'Re_382_M15': {0: nan},
 'Re_50_M15': {0: 0.8677},
 'Re_618_M15': {0: 0.8668},
 'Previous Pivot Date': {0: '2021-04-13 09:45:00'},
 'Previous Pivot Price': {0: 0.86408},
 'Date': {0: Timestamp('2021-04-13 13:00:00')},
 'Pivot Price': {0: 0.871180},
 'Pivot Length': {0: 0.007099}}

{'Re_236_H4': {0: nan},
 'Re_382_H4': {0: nan},
 'Re_50_H4': {0: 0.8677},
 'Re_618_H4': {0: 0.8668},
 'Previous Pivot Date': {0: '2021-04-14 09:00:00'},
 'Previous Pivot Price': {0: 0.89408},
 'Date': {0: Timestamp('2021-04-13 13:00:00')},
 'Pivot Price': {0: 0.891180},
 'Pivot Length': {0: 0.008099}}

Upvotes: 1

Views: 62

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195543

Maybe a simple filtering with set will do:

out = []
seen = set()
for d in tempDFList:
    t = (d["Previous Pivot Price"].iat[0], d["Pivot Price"].iat[0])
    if t not in seen:
        out.append(d)
        seen.add(t)

print(*out, sep="\n\n")

Prints:

   Re_236_H1  Re_382_H1  Re_50_H1  Re_618_H1  Previous Pivot Date  Previous Pivot Price                Date  Pivot Price  Pivot Length  df
0        NaN        NaN    0.8677     0.8668  2021-04-13 09:00:00               0.86408 2021-04-13 13:00:00      0.87118      0.007099  df

   Re_236_H4  Re_382_H4  Re_50_H4  Re_618_H4  Previous Pivot Date  Previous Pivot Price                Date  Pivot Price  Pivot Length   df
0        NaN        NaN    0.8677     0.8668  2021-04-14 09:00:00               0.89408 2021-04-13 13:00:00      0.89118      0.008099  df2

Upvotes: 1

Related Questions