Reputation: 183
I'm stuck. I have a pandas dataframe with j columns, each column containing 1000 values, each column represents a day. I have another array of individual values, each corresponding to expected values for each day.
I would like to find out how many values in each column are greater than the value of the previous day.
DF:
D1 D2 D3
5 6 9
10 2 1
3 9 2
Array:
(2, 4, 5)
for column D2, what proportion of values are greater than 2? for column D3, what proportion of values are greater than 4, for column D4 (not shown), what proportion of values are greater than D3, and so on...
in this case, it would by 66% (2/3) for D2, and then 33% (1/3) for D3.
Any help is appreciated. Thank you!
Upvotes: 1
Views: 814
Reputation: 75150
You can use:
arr = (2, 4, 5)
d = dict(zip(df.drop("D1",1).columns,arr))
pd.Series([df[k].gt(v).sum()/df.shape[0] for k,v in d.items()],index=d.keys())
D2 0.666667
D3 0.333333
dtype: float64
Upvotes: 2
Reputation: 4864
First, you shift the columns:
df1 = df[df.columns[1:]]
Then you shift the array:
ar1 = ar[:-1]
Then you subtract one from the other:
df2 = df1.apply(lambda x: x - ar1, axis=1)
Then you can count negative and positive entries to your heart's content:
Upvotes: 0
Reputation: 197
All you need to do is have a for loop go through each value and see if it larger than that number.
column_num = 2
more_than_expected = 0
values_total = 0
for val in df["D"+column_num]:
if val> arr[column_num-2]:
more_than_expected+=1
values_total+=1
print(more_than_expected/values_total)
Hope that helps
Upvotes: 0