Luc
Luc

Reputation: 747

Count values in columns with specific column names

Subs_1718   Count_1718  Subs_1819   Count_1819  Subs_1920   Count_1920
Apple           10.0      Grapes         12       Banana       12.0
Grapes           2.0      Apple           6       Grapes        8.0
Banana           2.0      Pineapple       3       Cashew        1.0
Dragonfruit      1.0      Banana          2       Apple         1.0
Kiwi             1.0      Kiwi            2       Melon         1.0
Melon            1.0      Cashew          1       Grapes        1.0

How do I create a new column that value_counts the values of the columns of df['Count_1718'], df['Count_1819'],df['Count_1920']?

Expected Output:

Subs_1720         Count_1720
Apple              17
Banana             16
Cashew              2
Dragonfruit         1
Grapes             22
Melon               2 
Pineapple           1

Upvotes: 1

Views: 126

Answers (3)

sammywemmy
sammywemmy

Reputation: 28644

You can convert the columns to a multiindex, stack, and do a groupby :

df.columns = df.columns.str.split("_", expand=True)
(df.stack()
.groupby("Subs")
.sum()
.reset_index()
.set_axis(["Subs_1720", "Count_1720"], axis=1))

    Subs_1720   Count_1720
0   Apple       17.0
1   Banana      16.0
2   Cashew      2.0
3   Dragonfruit 1.0
4   Grapes      23.0
5   Kiwi        3.0
6   Melon       2.0
7   Pineapple   3.0

Upvotes: 2

jezrael
jezrael

Reputation: 862641

Use wide_to_long with aggregate sum:

df1 = (pd.wide_to_long(df.reset_index(),
                    i='index', 
                    stubnames=['Subs','Count'], 
                    sep='_', 
                    j='d')
       .groupby('Subs')['Count']
       .sum()
       .rename_axis('Subs_1720')
       .reset_index(name='Count_1720'))
print (df1)
     Subs_1720  Count_1720
0        Apple        17.0
1       Banana        16.0
2       Cashew         2.0
3  Dragonfruit         1.0
4       Grapes        23.0
5         Kiwi         3.0
6        Melon         2.0
7    Pineapple         3.0

Upvotes: 2

yatu
yatu

Reputation: 88236

You can use pd.wide_to_long here specifying the corresponding stubnames with a groupby.sum:

(pd.wide_to_long(df.reset_index(), 
                 stubnames=['Subs', 'Count'], 
                 i='index', j='ix', 
                 suffix= '_\d+')
   .groupby('Subs').sum())

             Count
Subs              
Apple         17.0
Banana        16.0
Cashew         2.0
Dragonfruit    1.0
Grapes        23.0
Kiwi           3.0
Melon          2.0
Pineapple      3.0

Upvotes: 4

Related Questions