Reputation: 73
Below is a small section from my pandas dataframe. I would like to be able to get separate 'vel_x' histograms (counts, bins) for each value in count. Is there a fast, built-in way to do this without just looping through each value in count?
+-------+-------+-------+-------+--------+----+--------+
| | | x_loc | y_loc | vel_x | … | vel_z |
+-------+-------+-------+-------+--------+----+--------+
| count | slice | | | | | |
| 1 | 3 | 4 | 0 | 96 | 88 | 35 |
| | 4 | 10 | 2 | 54 | 42 | 37 |
| | 5 | 9 | 32 | 8 | 70 | 34 |
| | 6 | 36 | 89 | 69 | 46 | 78 |
| 2 | 5 | 17 | 41 | 48 | 45 | 71 |
| | 6 | 50 | 66 | 82 | 72 | 59 |
| | 7 | 14 | 24 | 55 | 20 | 89 |
| | 8 | 76 | 36 | 13 | 14 | 21 |
| 3 | 5 | 97 | 19 | 41 | 61 | 72 |
| | 6 | 22 | 4 | 56 | 82 | 15 |
| | 7 | 17 | 57 | 30 | 63 | 88 |
| | 8 | 83 | 43 | 35 | 8 | 4 |
+-------+-------+-------+-------+--------+----+--------+
I have tried many methods (apply, map, etc.), but I have not been able to get any of them to work. Each method just applies the mapped function to all the row values.
Essentially, I want to map this to each value in count (count_value) below:
def create_histogram(data, count_value):
values, bin_edges = np.histogram(data.loc[count_value, 'vel_x'])
return values
then something like this:
data.index.get_level_values('Count').map(create_histrogram(data))
Also, for reference, this is the way I can currently perform what I want, but it is not very efficient because my dataframe is very large.
for count_value in data.index.get_level_values('Count').unique:
values, bin_edges = np.histogram(data.loc[count_value, 'vel_x'])
the returned values can then be stored in another array.
Thank you in advance for your help!
Upvotes: 0
Views: 2732
Reputation: 3842
How about using groupby with level
param:
level : int, level name, or sequence of such, default None If the axis is a MultiIndex (hierarchical), group by a particular level or levels
for count, sdf in df.groupby(level=0):
values, bin_edges = np.histogram(sdf.loc[count, 'vel_x'])
UPDATE
Since you think the way mean(level=level)
works is better, you can also try this way which is inspired by mean
source code:
df['vel_x'].groupby(level=0).aggregate(np.histogram)
Upvotes: 3