Reputation: 297
I have two columns with a list of each strings, named Tag 1, Tag 2. In simple terms I need to check if a string in a cell is found anywhere in the adjacent column and if so, swap the values.
Existing Dataframe
Tag 1 Tag 2
Amazon Twitter
Amazon Google
eBay Amazon
Reddit Facebook
Desired Output
Tag 1 Tag 2
Amazon Twitter
Amazon Google
Amazon eBay
Reddit Facebook
In the desired outcome, you can see that Amazon has switched places with eBay because it was found in the Tag 1 column.
Minimum Reproducible Example
import pandas as pd
data = {'Tag 1': ['Amazon', 'Amazon', 'eBay', 'Reddit'],
'Tag 2': ['Twitter', 'Google', 'Amazon', 'Facebook']}
df = pd.DataFrame(data)
I've been researching similar posts but can't quite seem to get it exactly right.
Here's my code so far.
Upvotes: 0
Views: 495
Reputation: 120409
The answer of @sophocles is probably the right.
However if you just want to reorder columns in a lexicographical order for instance, try:
tags = ['Tag 1', 'Tag 2']
df[tags] = df[tags].apply(lambda x: sorted(tuple(x)), axis=1, result_type='expand')
print(df)
# Output
Tag 1 Tag 2
0 Amazon Twitter
1 Amazon Google
2 Amazon eBay
3 Facebook Reddit
Upvotes: 1
Reputation: 382
This might do it
import pandas as pd
import numpy as np
data = {'Tag 1': ['Amazon', 'Amazon', 'eBay', 'Reddit'],
'Tag 2': ['Twitter', 'Google', 'Amazon', 'Facebook']}
df = pd.DataFrame(data)
values = np.unique(df['Tag 1'].values) # All unique values in Tag 1
swapped = [] # So we don't double swap
for val in values:
rows = df.loc[df['Tag 2'] == val] # Get rows in Tag 2 that matches
for idx, row in rows.iterrows():
if idx in swapped: # Ignore if already swapped
continue
# Swap the values
temp = df.iloc[idx]['Tag 2']
df.iloc[idx]['Tag 2'] = df.iloc[idx]['Tag 1']
df.iloc[idx]['Tag 1'] = temp
swapped.append(idx)
There are probably more efficient solutions, but I think this is pretty clear. It goes through each unique value in the 'Tag 1' column and checks for matches in the 'Tag 2' column, and swaps them. I was not sure if you wanted to avoid double swaps, but that can be removed if it's not a requirement.
Upvotes: 1
Reputation: 13821
You could use pd.where
to check when a value in 'Tag 2' existis in 'Tag 1' using isin
and swap the values when you assign back:
df[['Tag 1','Tag 2']] = df[['Tag 2','Tag 1']].where(
df['Tag 2'].isin(df['Tag 1'].tolist()), df[['Tag 1','Tag 2']].values)
prints back:
Tag 1 Tag 2
0 Amazon Twitter
1 Amazon Google
2 Amazon eBay
3 Reddit Facebook
Upvotes: 1