Ele
Ele

Reputation: 553

pandas count ranges in on two data frame

assuming I have two dataframes: (1) list of ranges (2) actual values

import pandas as pd
import numpy as np
from datetime import datetime, timedelta

SLA = {'Wertebereich': [5, 10, 15, 20, 25]}
SLA = pd.DataFrame(data=SLA)


messwerte = pd.DataFrame(np.random.randint(0,30,size=10),
                  columns=["Messwerte"],
                  index=pd.date_range("20180101", periods=10))


       Wertebereich 
0             5      
1            10      
2            15     
3            20      
4            25      

            Messwerte
2018-01-01         22
2018-01-02         13
2018-01-03         14
2018-01-04         17
2018-01-05          1
2018-01-06         11
2018-01-07         17
2018-01-08          6
2018-01-09          4
2018-01-10         10

I now want to add a new column to SLA ("Count") where I sum up all occurrences within each range.

I've created an iterating solution but was wondering if there is a more pandas way which can also handle 10000x3000 rows faster than my solution.

import pandas as pd
import numpy as np
from datetime import datetime, timedelta

SLA = {'Wertebereich': [5, 10, 15, 20, 25]}
SLA = pd.DataFrame(data=SLA)


messwerte = pd.DataFrame(np.random.randint(0,30,size=10),
                  columns=["Messwerte"],
                  index=pd.date_range("20180101", periods=10))


#print(SLA.to_string())
#print(messwerte.to_string())


###############
SLA["Count"] = 0

for i in range(0, len(SLA)-1):
    counter = 0
    treshold_min = SLA.iloc[i].get('Wertebereich')
    treshold_max = SLA.iloc[i+1].get('Wertebereich')
    for x in range(0, len(messwerte)):
        val = messwerte.iloc[x].get('Messwerte')
        print('---- ' + str(val) )
        if ((val >= treshold_min) & (val < treshold_max)):
            counter = counter +1

    SLA.ix[i,'Count'] = counter

print(SLA.to_string())
print(messwerte.to_string())

any idea is appreciated.

thanks!

Upvotes: 1

Views: 32

Answers (1)

Scott Boston
Scott Boston

Reputation: 153460

Try this:

messwerte['Messwerte'].value_counts(bins=SLA['Wertebereich'])

Output:

(20.0, 25.0]     5
(4.999, 10.0]    2
(10.0, 15.0]     1
(15.0, 20.0]     0
Name: Messwerte, dtype: int64

Upvotes: 1

Related Questions