Reputation: 15
I have a dataframe similar to:
City %SC Team
0 London 50.5 A
1 London 40.1 B
2 London 9.4 C
3 Birmingham 31.3 B
4 Birmingham 27.1 A
5 Birmingham 23.7 D
6 Birmingham 17.9 C
7 York 40.1 A
8 York 38.8 C
9 York 21.1 B
.
.
.
I want to separate the cities in Clear win, Marginal win, Extremely Marginal win based on the difference of the top 2 teams. I have tried the following code:
df = pd.read_csv('file.csv')
Clear, Marginal, EMarginal = [],[],[]
for i in file['%SC']:
if i[0] - i[1] >= 10:
Clear.append('City','Team')
elif i[0] - i[1] < 10 and i[0] - i[1] >=2 :
Marginal.append('City','Team')
else:
EMarginal.append('City','Team')
Expected output:
Clear = [London , A]
Marginal = [Birmingham , B]
EMarginal = [York , A]
My approach doesn't seem right, can anyone suggest a way I could achieved the desired result? Many thanks
Upvotes: 0
Views: 58
Reputation: 2361
If I understand correctly, you want to divide the cities into groups according to the first two teams.
def classify(city):
largest = city.nlargest(2, '%SC')
diff = largest['%SC'].iloc[0] - largest['%SC'].iloc[1]
if diff >= 10:
return 'Clear'
elif diff < 10 and diff >=2 :
return 'Marginal'
return 'EMarginal'
groups = df.groupby("City").apply(classify)
# groups is the following table:
# City
# Birmingham Marginal
# London Clear
# York EMarginal
# dtype: object
If you insist to have a them as a list, you can call
groups.groupby(groups).apply(lambda g: list(g.index)).to_dict()
# Output:
# {'Clear': ['London'], 'EMarginal': ['York'], 'Marginal': ['Birmingham']}
If you still insist to include the winning team in each city, you can call
groups.name = "Margin"
df.join(groups, on="City")\
.groupby("Margin")\
.apply(
lambda g: list(
g.nlargest(1, "%SC")
.apply(lambda row: (row["City"], row["Team"]), axis=1)
)
).to_dict()
# Output
# {'Clear': [('London', 'A')], 'EMarginal': [('York', 'A')], 'Marginal': [('Birmingham', 'B')]}
Upvotes: 1