Ian_De_Oliveira
Ian_De_Oliveira

Reputation: 291

create a range and sum using pandas python

I have the follow data structure:

      speed            frequency_count
      900.0                5
      902.0                1
      905.0                8
      906.0                1
      910.0               15
      911.0                1
      914.0                1
      915.0               45
      916.0                4

Initially this data was 2 lists , i tried to loop and create a new dictionary but i failed.My expected result is, the result also could be a dictionary. It does not matter really.

      speed            frequency_count          range              sum_freq
      900.0                5                   900-950                15
      902.0                1                   900-950                15
      905.0                8                   900-950                15
      956.0                1                   900-950                15
      960.0               15                   950-1000               17
      981.0                1                   950-1000               17
      944.0                1                   950-1000               17

If the data was in list format:

   x_list = [900.0, 902.0, 905.0, 906.0, 950.0, 951.0, 954.0, 955.0, 986.0, 987.0]

   y_list= [5, 1, 8, 1, 15, 1, 1, 45, 4, 5]

How could you either transform to a dictionary of ranges or 2 lists of groupsums? Thansks a lot im actually trying to learn different format manipulations rather than pandas as well.

Upvotes: 2

Views: 99

Answers (2)

U13-Forward
U13-Forward

Reputation: 71580

The second part is the same as @jezrael's but first isn't.

But anyways,

Use pandas.DataFrame.apply + pandas.DataFrame.transform:

df['range'] = df['speed'].apply(lambda x: '900-950' if x in range(900,950) else '950-1000')
df['sum_freq'] = df.groupby('range')['frequency_count'].transform(sum)
print(df)

Output:

   speed  frequency_count     range  sum_freq
0  900.0                5   900-950        15
1  902.0                1   900-950        15
2  905.0                8   900-950        15
3  956.0                1  950-1000        17
4  960.0               15  950-1000        17
5  981.0                1  950-1000        17
6  944.0                1   900-950        15

Upvotes: 0

jezrael
jezrael

Reputation: 862771

Use cut with GroupBy.transform:

df['range'] = pd.cut(df['speed'], 
                     bins=[900,950,1000], 
                     labels=['900-950','950-1000'], 
                     include_lowest=True)
df['sum_freq'] = df.groupby('range')['frequency_count'].transform('sum')
print (df)
   speed  frequency_count     range  sum_freq
0  900.0                5   900-950        15
1  902.0                1   900-950        15
2  905.0                8   900-950        15
3  956.0                1  950-1000        17
4  960.0               15  950-1000        17
5  981.0                1  950-1000        17
6  944.0                1   900-950        15

More dynamic solution with labels from bins:

x_list = [900.0, 902.0, 905.0, 906.0, 950.0, 951.0, 954.0, 955.0, 986.0, 987.0]
y_list= [5, 1, 8, 1, 15, 1, 1, 45, 4, 5]
df = pd.DataFrame({'speed':x_list, 'frequency_count':y_list})

bins = [900,950,1000]
labels = ['{}-{}'.format(i + 1, j) for i, j in zip(bins[:-1], bins[1:])] 
labels[0] = '{}-{}'.format(bins[0], bins[1])

df['range'] = pd.cut(df['speed'], bins=bins, labels=labels, include_lowest=True)
df['sum_freq'] = df.groupby('range')['frequency_count'].transform('sum')
print (df)
   speed  frequency_count     range  sum_freq
0  900.0                5   900-950        30
1  902.0                1   900-950        30
2  905.0                8   900-950        30
3  906.0                1   900-950        30
4  950.0               15   900-950        30
5  951.0                1  951-1000        56
6  954.0                1  951-1000        56
7  955.0               45  951-1000        56
8  986.0                4  951-1000        56
9  987.0                5  951-1000        56

Upvotes: 7

Related Questions