Calculation in a dataframe column

Question

I have a dataframe similar to:

  City       %SC    Team
0 London     50.5   A
1 London     40.1   B
2 London     9.4    C
3 Birmingham 31.3   B
4 Birmingham 27.1   A
5 Birmingham 23.7   D
6 Birmingham 17.9   C
7 York       40.1   A
8 York       38.8   C
9 York       21.1   B
.
.
.

I want to separate the cities in Clear win, Marginal win, Extremely Marginal win based on the difference of the top 2 teams. I have tried the following code:

df = pd.read_csv('file.csv')
Clear, Marginal, EMarginal = [],[],[]
for i in file['%SC']:
    if i[0] - i[1] >= 10:
    Clear.append('City','Team')
    elif i[0] - i[1] < 10 and i[0] - i[1] >=2 :
    Marginal.append('City','Team')
    else:
    EMarginal.append('City','Team')

Expected output:

Clear = [London , A]
Marginal = [Birmingham , B]
EMarginal = [York , A]

My approach doesn't seem right, can anyone suggest a way I could achieved the desired result? Many thanks

tmrlvi · Accepted Answer

If I understand correctly, you want to divide the cities into groups according to the first two teams.

def classify(city):
    largest = city.nlargest(2, '%SC')
    diff = largest['%SC'].iloc[0] - largest['%SC'].iloc[1]
    if diff >= 10:
        return 'Clear'
    elif diff < 10 and diff >=2 :
        return 'Marginal'
    return 'EMarginal'

groups = df.groupby("City").apply(classify)

# groups is the following table:
# City
# Birmingham     Marginal
# London            Clear
# York          EMarginal
# dtype: object

If you insist to have a them as a list, you can call

groups.groupby(groups).apply(lambda g: list(g.index)).to_dict()
# Output:
# {'Clear': ['London'], 'EMarginal': ['York'], 'Marginal': ['Birmingham']}

If you still insist to include the winning team in each city, you can call

groups.name = "Margin"
df.join(groups, on="City")\
  .groupby("Margin")\
  .apply(
      lambda g: list(
          g.nlargest(1, "%SC")
           .apply(lambda row: (row["City"], row["Team"]), axis=1)
      )
  ).to_dict()
# Output
# {'Clear': [('London', 'A')], 'EMarginal': [('York', 'A')], 'Marginal': [('Birmingham', 'B')]}

Calculation in a dataframe column

Answers (1)

Related Questions