Pratham_Amitabh
Pratham_Amitabh

Reputation: 71

Calculating total unique values per column

I am trying to use the below data to get the 'Total Facebook likes' for each unique actor. The output should be in two columns, column 1 containing the unique actor names from all the actor_name columns and column 2 should have the total likes from all three actor_facebook_likes columns. Any idea on how this can done, will be appreciated.

{'actor_1_name': {0: 'Ryan Gosling',
  1: 'Ginnifer Goodwin',
  2: 'Dev Patel',
  3: 'Amy Adams',
  4: 'Casey Affleck'},
 'actor_2_name': {0: 'Emma Stone',
  1: 'Jason Bateman',
  2: 'Nicole Kidman',
  3: 'Jeremy Renner',
  4: 'Michelle Williams '},
 'actor_3_name': {0: 'Amiée Conn',
  1: 'Idris Elba',
  2: 'Rooney Mara',
  3: 'Forest Whitaker',
  4: 'Kyle Chandler'},
 'actor_1_facebook_likes': {0: 14000, 1: 2800, 2: 33000, 3: 35000, 4: 518},
 'actor_2_facebook_likes': {0: 19000.0,
  1: 28000.0,
  2: 96000.0,
  3: 5300.0,
  4: 71000.0},
 'actor_3_facebook_likes': {0: nan, 1: 27000.0, 2: 9800.0, 3: nan, 4: 3300.0}}

Upvotes: 2

Views: 72

Answers (2)

wwnde
wwnde

Reputation: 26676

Use pivot to get sum of likes for each actor in each facebook like category

 df3=pd.pivot_table(df,columns=['actor_1_name', 'actor_2_name', 'actor_3_name'],values=['actor_1_facebook_likes', 'actor_2_facebook_likes',
           'actor_3_facebook_likes'],aggfunc=[np.sum]).reset_index()

Melt the Actors, groupby and sum all categories

res=pd.melt(df3,id_vars=['sum'], value_vars=['actor_1_name', 'actor_2_name', 'actor_3_name']).groupby('value').agg(Totallikes =('sum', 'sum')).reset_index()

Rename the columns

res.columns=['Actor','Totallikes']

print(res)

                 Actor  Totallikes
0           Amiée Conn     33000.0
1            Amy Adams     40300.0
2        Casey Affleck     74818.0
3            Dev Patel    138800.0
4           Emma Stone     33000.0
5      Forest Whitaker     40300.0
6     Ginnifer Goodwin     57800.0
7           Idris Elba     57800.0
8        Jason Bateman     57800.0
9        Jeremy Renner     40300.0
10       Kyle Chandler     74818.0
11  Michelle Williams      74818.0
12       Nicole Kidman    138800.0
13         Rooney Mara    138800.0
14        Ryan Gosling     33000.0

Upvotes: 1

chambeeee
chambeeee

Reputation: 95

This makes the job :

df0 = pd.DataFrame({'actor_1_name': {0: 'Ryan Gosling',
  1: 'Ginnifer Goodwin',
  2: 'Dev Patel',
  3: 'Amy Adams',
  4: 'Casey Affleck'},
 'actor_2_name': {0: 'Emma Stone',
  1: 'Jason Bateman',
  2: 'Nicole Kidman',
  3: 'Jeremy Renner',
  4: 'Michelle Williams '},
 'actor_3_name': {0: 'Amiée Conn',
  1: 'Idris Elba',
  2: 'Rooney Mara',
  3: 'Forest Whitaker',
  4: 'Kyle Chandler'},
 'actor_1_facebook_likes': {0: 14000, 1: 2800, 2: 33000, 3: 35000, 4: 518},
 'actor_2_facebook_likes': {0: 19000.0,
  1: 28000.0,
  2: 96000.0,
  3: 5300.0,
  4: 71000.0},
 'actor_3_facebook_likes': {0: 0, 1: 27000.0, 2: 9800.0, 3: 0, 4: 3300.0}})

df1 = pd.concat([df0, df0, df0])

dfa = pd.DataFrame()

for i in range(0, 3):
    names = list(df1.iloc[3*i:4+3*i, i])
    val = df1.iloc[3*i:4+3*i, 3+i]
    df = pd.DataFrame(names)
    df['value'] = val
    dfa = pd.concat([dfa, df], axis = 0)

Upvotes: 1

Related Questions