Hrihaan
Hrihaan

Reputation: 285

Forming an array from column values based on a different column value condition

I have the following dataset.

  1. 10
  2. 20
  3. 30
  4. 40
  5. 50
  6. 60
  7. 70
  8. 80
  9. 90
  10. 100

My aim is to replace the values in Column 2 with the average of two successive values, when the value in Column 1 remains less than 5 and greater than 6. To be precise,in <5 range the values in Column 1 are 1,2,3 and 4 and corresponding column 2 values are 10,20,30,40. So, i want to take the average of (10,20)=15 and (30,40)=35. I want to do the same in the range >6 for Column 1, take the average of corresponding Column 2 values (70,80)=75 and (80,90)=95. I will not take the average of Column 2 values when Column 1 values do not fall in those two ranges (5 and 6) and corresponding Column 2 values (50 and 60) and finally create an array of Column 2 values based on these three conditions.

I tried the following approach:

import numpy as np
import pandas as pd
data= pd.read_table('/Users/Hrihaan/Desktop/Data.txt', dtype=float, header=None, sep='\s+').values
t=data[:,0]
df = pd.DataFrame({"x":t, "y":data[:,1]})
x=np.where(t<=4,data[:,1],np.nan)
x_1=np.nanmean(x.reshape(-1, 2), axis=1)
y=np.where((df.x>4)&(df.x<7), df.y,np.nan)
z=np.where(t>6,data[:,1],np.nan)
z_1=np.nanmean(z.reshape(-1, 2), axis=1)
A=np.concatenate((x_1,y,z_1), axis=0)
print(A)

I am getting the following output: [ 15. 35. nan nan nan nan nan nan nan 50. 60. nan nan nan nan nan nan nan 75. 95.]

My expected output is: [ 15. 35. 50. 60. 75. 95.]

Any help on how to get around the np.nan in my code would be really helpful.

Upvotes: 0

Views: 53

Answers (2)

Mr. T
Mr. T

Reputation: 12410

I genuinely have difficulties to see your bigger concept here. For your very specific problem, this would work:

import pandas as pd
#read your file
data= pd.read_table('test.txt', dtype = float, delim_whitespace = True, names = ["x", "y"])
#define rows you want to exclude
exclude_rows = set([5, 6])
#create new column with rolling mean of two rows
data["mean"] = data["y"].rolling(2).mean()
#overwrite rolling mean, when row should be excluded from calculating the average
data["mean"][data["x"].isin(exclude_rows)] = data["y"]
#filter data
A = data["mean"][(data["x"].isin(exclude_rows)) |  (data["x"] % 2 == 0)]

But what is your expected output, if you want to exclude for instance x = 4 and 6? Then you have several singular values, for which you didn't give any instruction, what should happen with them in the process of averaging.

Upvotes: 1

anishtain4
anishtain4

Reputation: 2402

This does what you want

a=np.vstack((np.arange(1,11),np.arange(10,110,10))).T
b=(a[:-1,1]+a[1:,1])/2
indL=np.argmax(a[:,0]>5)-1
indH=np.argmax(a[:,0]>6)
out=np.hstack((b[:indL:2],a[indL:indH,1],b[indH::2]))

Upvotes: 1

Related Questions