Reputation: 80
I have a list like:
list_of_list = [["aa", "yy"], ["gg", "xx"]]
and a pandas dataframe like this:
month column1 column2
0 June xx aa
1 June gg xx
2 August xx yy
I go through each sub_list of the list_of_lists and check if each item of the sub_list exists in any of the columns of the dataframe.
for sub_list in list_of_lists:
print("\n")
print("Sub_list: ", sub_list)
for item in sub_list:
for col in df.columns:
if item in df[col].values:
print(df[col][df[col] == item].to_frame())
This returns me:
Sub_list: ['aa', 'yy']
column2
0 aa
column2
2 yy
Sub_list: ["gg", "xx"]
column1
1 gg
column1
0 xx
column1
2 xx
column2
1 xx
What I want is except to the above to return also the index and the corresponding month when and only when, each sub_list of the list_of_lists is not in the same index
So for list_of_list = [["aa", "yy"]]
except to the above code i want also to return
0 June aa
2 August yy
Any ideas?
Upvotes: 0
Views: 948
Reputation: 120559
Use melt
to flat your original dataframe and create a dataframe from your list_of_list
then merge them and finally remove duplicates in two pass.
Step 1. Format your dataframes
df1 = df.melt('month', ignore_index=False).reset_index()
df2 = pd.DataFrame(list_of_list).melt(ignore_index=False)['value'].reset_index()
print(df1, df2)
# Output
# df1 (from your original df)
index month variable value
0 0 June column1 xx
1 1 June column1 gg
2 2 August column1 xx
3 0 June column2 aa
4 1 June column2 xx
5 2 August column2 yy
# df2 (from list_of_list)
index value
0 0 aa
1 1 gg
2 0 yy
3 1 xx
Step 2. Merge your two dataframes
out = df1.merge(df2, on='value')
print(out)
# Output:
index_x month variable value index_y
0 0 June column1 xx 1
1 2 August column1 xx 1
2 1 June column2 xx 1
3 1 June column1 gg 1
4 0 June column2 aa 0
5 2 August column2 yy 0
Step 3. Keep only right rows
out = out.drop_duplicates(subset=['index_x', 'index_y'], keep=False) \
.drop_duplicates('value', keep=False)
print(out)
# Output:
index_x month variable value index_y
4 0 June column2 aa 0
5 2 August column2 yy 0
Step 4. Final output
out = out.set_index('index_x').rename_axis(index=None)[['month', 'value']]
print(out)
# Output:
month value
0 June aa
2 August yy
Edit
Start after step 2, create a new column notin
:
out['notin'] = out.groupby('index_y')['index_x'] \
.transform(lambda x: len(x) == len(set(x)))
print(out)
# Output:
index_x month variable value index_y notin
0 0 June column1 xx 1 False # For ["gg", "xx"]
1 2 August column1 xx 1 False
2 1 June column2 xx 1 False
3 1 June column1 gg 1 False
4 0 June column2 aa 0 True # For ["aa", "yy"]
5 2 August column2 yy 0 True
From here, what is your expect output?
Upvotes: 1
Reputation: 23166
stack
the DataFrame and then loop through your lists:
df["index"] = df.index
srs = df.set_index(["index", "month"]).stack().rename("value")
for sublist in list_of_list:
print(f"\nSub list: {sublist}")
print(srs[srs.isin(sublist)].reset_index(2, drop=True).reset_index().drop_duplicates("index", keep=False).drop_duplicates("value", keep=False))
df = pd.DataFrame({"month": ["June", "June", "August"],
"column1": ["xx", "gg", "xx"],
"column2": ["aa", "xx", "yy"]})
list_of_list = [["aa", "yy"], ["gg", "xx"]]
Sub list: ['aa', 'yy']
index month value
0 0 June aa
1 2 August yy
Sub list: ['gg', 'xx']
Empty DataFrame
Columns: [index, month, value]
Index: []
Upvotes: 0
Reputation: 1053
Could use the any
to achieve the same.
for sub_list in list_of_lists:
print("\n")
print("Sub_list: ", sub_list)
for item in sub_list:
print(df[(df == item).any(axis=1)])
Upvotes: 0