Leonard
Leonard

Reputation: 43

Aggregate DataFrame base on list values

I have the next problem.

I have a list with string values:

a = ['word1', 'word2', 'word3', 'word4', ..., 'wordN']

And I have the dataframe with values:

+--------------+----------+-----------+
| keywords | impressions  | clicks     | 
+--------------+----------+-----------+
| word1    | 1245523      |   12321231 |              
+--------------+----------+-----------+
| word2    | 4212321      |  12312312  |      
+--------------+----------+-----------+
........................................

Please advice me on how to create a specific, aggregated dataframe with column values from list and with sum of the impressions and clicks columns if the word from list is met in keyword column.

I've tried to iterate through dataframe with iterrows() method but it does not work for this situation.

Upvotes: 0

Views: 133

Answers (3)

Leonard
Leonard

Reputation: 43

Found the way:

b = []
for i in a:
  b.append((a, checking_data[checking_data['keywords'].str.contains(a)][['impressions', 'clicks']].sum().values[0], 
               checking_data[checking_data['keywords'].str.contains(a)][['impressions', 'clicks']].sum().values[1]))


groupedOne_df = pd.DataFrame.from_records(b, columns = ['keywords', 'impressions', 'clicks'])

Now you can create a pandas df from those values.

Upvotes: 0

rhug123
rhug123

Reputation: 8768

You would want to filter your df to make sure you are only using items in the list.

df = df[df['keywords'].isin(a)]

Then you would use groupby to aggregate your results

df.groupby('keywords', as_index=False).sum()

Upvotes: 1

Gustav Rasmussen
Gustav Rasmussen

Reputation: 3961

specify the df, then subtract the columns not to sum ("keywords"), finally loop over the list of words:

import pandas as pd

a = ['word1', 'word2']

df = pd.DataFrame([
    ["word1", 1245523, 12321231],
    ["word2", 4212321, 12312312]
],
columns=["keywords", "impressions", "clicks"]
)

col_list = list(df)
col_list.remove('keywords')

for word in a:
    df[word] = df[col_list].sum(axis=1)

print(df)

Returns:

  keywords  impressions    clicks     word1     word2
0    word1      1245523  12321231  13566754  13566754
1    word2      4212321  12312312  16524633  16524633

Upvotes: 0

Related Questions