Reputation: 285
I have the following dataset.
My aim is to replace the values in Column 2 with the average of two successive values, when the value in Column 1 remains less than 5 and greater than 6. To be precise,in <5 range the values in Column 1 are 1,2,3 and 4 and corresponding column 2 values are 10,20,30,40. So, i want to take the average of (10,20)=15 and (30,40)=35. I want to do the same in the range >6 for Column 1, take the average of corresponding Column 2 values (70,80)=75 and (80,90)=95. I will not take the average of Column 2 values when Column 1 values do not fall in those two ranges (5 and 6) and corresponding Column 2 values (50 and 60) and finally create an array of Column 2 values based on these three conditions.
I tried the following approach:
import numpy as np
import pandas as pd
data= pd.read_table('/Users/Hrihaan/Desktop/Data.txt', dtype=float, header=None, sep='\s+').values
t=data[:,0]
df = pd.DataFrame({"x":t, "y":data[:,1]})
x=np.where(t<=4,data[:,1],np.nan)
x_1=np.nanmean(x.reshape(-1, 2), axis=1)
y=np.where((df.x>4)&(df.x<7), df.y,np.nan)
z=np.where(t>6,data[:,1],np.nan)
z_1=np.nanmean(z.reshape(-1, 2), axis=1)
A=np.concatenate((x_1,y,z_1), axis=0)
print(A)
I am getting the following output: [ 15. 35. nan nan nan nan nan nan nan 50. 60. nan nan nan nan nan nan nan 75. 95.]
My expected output is: [ 15. 35. 50. 60. 75. 95.]
Any help on how to get around the np.nan in my code would be really helpful.
Upvotes: 0
Views: 53
Reputation: 12410
I genuinely have difficulties to see your bigger concept here. For your very specific problem, this would work:
import pandas as pd
#read your file
data= pd.read_table('test.txt', dtype = float, delim_whitespace = True, names = ["x", "y"])
#define rows you want to exclude
exclude_rows = set([5, 6])
#create new column with rolling mean of two rows
data["mean"] = data["y"].rolling(2).mean()
#overwrite rolling mean, when row should be excluded from calculating the average
data["mean"][data["x"].isin(exclude_rows)] = data["y"]
#filter data
A = data["mean"][(data["x"].isin(exclude_rows)) | (data["x"] % 2 == 0)]
But what is your expected output, if you want to exclude for instance x = 4 and 6? Then you have several singular values, for which you didn't give any instruction, what should happen with them in the process of averaging.
Upvotes: 1
Reputation: 2402
This does what you want
a=np.vstack((np.arange(1,11),np.arange(10,110,10))).T
b=(a[:-1,1]+a[1:,1])/2
indL=np.argmax(a[:,0]>5)-1
indH=np.argmax(a[:,0]>6)
out=np.hstack((b[:indL:2],a[indL:indH,1],b[indH::2]))
Upvotes: 1