logisticregress
logisticregress

Reputation: 183

Compare Column Values by Array

I'm stuck. I have a pandas dataframe with j columns, each column containing 1000 values, each column represents a day. I have another array of individual values, each corresponding to expected values for each day.

I would like to find out how many values in each column are greater than the value of the previous day.

DF:


D1  D2  D3 
5   6   9
10  2   1 
3   9   2 

Array: 
(2, 4, 5)

for column D2, what proportion of values are greater than 2? for column D3, what proportion of values are greater than 4, for column D4 (not shown), what proportion of values are greater than D3, and so on...

in this case, it would by 66% (2/3) for D2, and then 33% (1/3) for D3.

Any help is appreciated. Thank you!

Upvotes: 1

Views: 814

Answers (3)

anky
anky

Reputation: 75150

You can use:

arr = (2, 4, 5)
d = dict(zip(df.drop("D1",1).columns,arr))
pd.Series([df[k].gt(v).sum()/df.shape[0] for k,v in d.items()],index=d.keys())

D2    0.666667
D3    0.333333
dtype: float64

Upvotes: 2

Igor Rivin
Igor Rivin

Reputation: 4864

First, you shift the columns:

df1 = df[df.columns[1:]]

Then you shift the array:

ar1 = ar[:-1]

Then you subtract one from the other:

df2 = df1.apply(lambda x: x - ar1, axis=1)

Then you can count negative and positive entries to your heart's content:

Upvotes: 0

smal
smal

Reputation: 197

All you need to do is have a for loop go through each value and see if it larger than that number.

column_num = 2
more_than_expected = 0
values_total = 0
for val in df["D"+column_num]:
   if val> arr[column_num-2]:
       more_than_expected+=1
   values_total+=1
print(more_than_expected/values_total)

Hope that helps

Upvotes: 0

Related Questions