R. Cox
R. Cox

Reputation: 879

Compare columns in a dictionary of dataframes

I have a dictionary of dataframes (Di_1). Each dataframe has the same number of columns, column names, number of rows and row indexes. I also have a list of the names of the dataframes (dfs). I would like to compare the contents of one of the columns (A) in each dataframe with those of the last dataframe in the list to see whether they are the same. For example:

df_A = pd.DataFrame({'A': [1,0,1,0]})
df_B = pd.DataFrame({'A': [1,1,0,0]})

Di_1 = {'X': df_A, 'Y': df_B}

dfs  = ['X','Y']

I tried:

for df in dfs:
    Di_1[str(df)]['True'] = Di_1[str(df)]['A'] .equals(Di_1[str(dfs[-1])]['A'])

I got:

[0,0,0,0]

I would like to get:

[1,0,0,1]

My attempt is checking whether the whole column is the same but I would instead please like to get it to go through each dataframe row by row.

Upvotes: 2

Views: 773

Answers (2)

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 476659

I think you make things too complicated here. You can

series_last = Di_1[dfs[-1]]['A']

for df in map(Di_1.get, dfs):
    df['True'] = df['A'] == series_last

This will produce as result:

>>> df_A
   A   True
0  1   True
1  0  False
2  1  False
3  0   True
>>> df_B
   A  True
0  1  True
1  1  True
2  0  True
3  0  True

So each df_i has an extra column named 'True' (perhaps you better use a different name), that checks if for a specific row, the value is the same as the one in the series_last.

In case the dfs contains something else than strings, we can first convert these to strings:

series_last = Di_1[str(dfs[-1])]['A']

for df in map(Di_1.get, map(str, dfs)):
    df['True'] = df['A'] == series_last

Upvotes: 2

anky
anky

Reputation: 75080

Create a list:

l=[Di_1[i] for i in dfs]

Then using isin() you can compare the first and last df

l[0].isin(l[-1]).astype(int)

   A
0  1
1  0
2  0
3  1

Upvotes: 1

Related Questions