Reputation: 553
assuming I have two dataframes: (1) list of ranges (2) actual values
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
SLA = {'Wertebereich': [5, 10, 15, 20, 25]}
SLA = pd.DataFrame(data=SLA)
messwerte = pd.DataFrame(np.random.randint(0,30,size=10),
columns=["Messwerte"],
index=pd.date_range("20180101", periods=10))
Wertebereich
0 5
1 10
2 15
3 20
4 25
Messwerte
2018-01-01 22
2018-01-02 13
2018-01-03 14
2018-01-04 17
2018-01-05 1
2018-01-06 11
2018-01-07 17
2018-01-08 6
2018-01-09 4
2018-01-10 10
I now want to add a new column to SLA ("Count") where I sum up all occurrences within each range.
I've created an iterating solution but was wondering if there is a more pandas way which can also handle 10000x3000 rows faster than my solution.
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
SLA = {'Wertebereich': [5, 10, 15, 20, 25]}
SLA = pd.DataFrame(data=SLA)
messwerte = pd.DataFrame(np.random.randint(0,30,size=10),
columns=["Messwerte"],
index=pd.date_range("20180101", periods=10))
#print(SLA.to_string())
#print(messwerte.to_string())
###############
SLA["Count"] = 0
for i in range(0, len(SLA)-1):
counter = 0
treshold_min = SLA.iloc[i].get('Wertebereich')
treshold_max = SLA.iloc[i+1].get('Wertebereich')
for x in range(0, len(messwerte)):
val = messwerte.iloc[x].get('Messwerte')
print('---- ' + str(val) )
if ((val >= treshold_min) & (val < treshold_max)):
counter = counter +1
SLA.ix[i,'Count'] = counter
print(SLA.to_string())
print(messwerte.to_string())
any idea is appreciated.
thanks!
Upvotes: 1
Views: 32
Reputation: 153460
Try this:
messwerte['Messwerte'].value_counts(bins=SLA['Wertebereich'])
Output:
(20.0, 25.0] 5
(4.999, 10.0] 2
(10.0, 15.0] 1
(15.0, 20.0] 0
Name: Messwerte, dtype: int64
Upvotes: 1