Reputation: 291
I have the follow data structure:
speed frequency_count
900.0 5
902.0 1
905.0 8
906.0 1
910.0 15
911.0 1
914.0 1
915.0 45
916.0 4
Initially this data was 2 lists , i tried to loop and create a new dictionary but i failed.My expected result is, the result also could be a dictionary. It does not matter really.
speed frequency_count range sum_freq
900.0 5 900-950 15
902.0 1 900-950 15
905.0 8 900-950 15
956.0 1 900-950 15
960.0 15 950-1000 17
981.0 1 950-1000 17
944.0 1 950-1000 17
If the data was in list format:
x_list = [900.0, 902.0, 905.0, 906.0, 950.0, 951.0, 954.0, 955.0, 986.0, 987.0]
y_list= [5, 1, 8, 1, 15, 1, 1, 45, 4, 5]
How could you either transform to a dictionary of ranges or 2 lists of groupsums? Thansks a lot im actually trying to learn different format manipulations rather than pandas as well.
Upvotes: 2
Views: 99
Reputation: 71580
The second part is the same as @jezrael's but first isn't.
But anyways,
Use pandas.DataFrame.apply
+ pandas.DataFrame.transform
:
df['range'] = df['speed'].apply(lambda x: '900-950' if x in range(900,950) else '950-1000')
df['sum_freq'] = df.groupby('range')['frequency_count'].transform(sum)
print(df)
Output:
speed frequency_count range sum_freq
0 900.0 5 900-950 15
1 902.0 1 900-950 15
2 905.0 8 900-950 15
3 956.0 1 950-1000 17
4 960.0 15 950-1000 17
5 981.0 1 950-1000 17
6 944.0 1 900-950 15
Upvotes: 0
Reputation: 862771
Use cut
with GroupBy.transform
:
df['range'] = pd.cut(df['speed'],
bins=[900,950,1000],
labels=['900-950','950-1000'],
include_lowest=True)
df['sum_freq'] = df.groupby('range')['frequency_count'].transform('sum')
print (df)
speed frequency_count range sum_freq
0 900.0 5 900-950 15
1 902.0 1 900-950 15
2 905.0 8 900-950 15
3 956.0 1 950-1000 17
4 960.0 15 950-1000 17
5 981.0 1 950-1000 17
6 944.0 1 900-950 15
More dynamic solution with labels from bins:
x_list = [900.0, 902.0, 905.0, 906.0, 950.0, 951.0, 954.0, 955.0, 986.0, 987.0]
y_list= [5, 1, 8, 1, 15, 1, 1, 45, 4, 5]
df = pd.DataFrame({'speed':x_list, 'frequency_count':y_list})
bins = [900,950,1000]
labels = ['{}-{}'.format(i + 1, j) for i, j in zip(bins[:-1], bins[1:])]
labels[0] = '{}-{}'.format(bins[0], bins[1])
df['range'] = pd.cut(df['speed'], bins=bins, labels=labels, include_lowest=True)
df['sum_freq'] = df.groupby('range')['frequency_count'].transform('sum')
print (df)
speed frequency_count range sum_freq
0 900.0 5 900-950 30
1 902.0 1 900-950 30
2 905.0 8 900-950 30
3 906.0 1 900-950 30
4 950.0 15 900-950 30
5 951.0 1 951-1000 56
6 954.0 1 951-1000 56
7 955.0 45 951-1000 56
8 986.0 4 951-1000 56
9 987.0 5 951-1000 56
Upvotes: 7