Pandas data frame

Question

I have a question regarding my following code, I have a data set and a list , I want to compare each data value of my data set with two conditions, if the condition is true then keep the previous value of the data frame otherwise make it as None, My code works perfectly for small data set however it will takes too much time and without any values for my big data set. Is there better solution?

new_data=data
    for col in df.columns:
        for i in range(len(df)):
            if (df.iloc[i][col] >list_min[i] ) & (df.iloc[i][col]



thanks for comments or another solution.

This is my code that is not work :

data = pd.read_csv('./dataset/w.csv')
i=0
data = data.applymap(np.log)
data = data.drop('time', axis=1)
q75_list = []
q25_list = []
iqr_list = []
min_list = []
max_list = []
new_data=data
for col in data.columns.values:
    q75_list.append(np.nanpercentile(data[col], 75))
    q25_list.append(np.nanpercentile(data[col], 25))

    iqr_list = np.array(q75_list) - np.array(q25_list)
    min_list = np.array(q25_list) - (np.array(iqr_list * 1.5))
    max_list = np.array(q75_list) + (np.array(iqr_list * 1.5))

print("Max :
",max_list,"
 Min :
",min_list)

for col in data.columns:
    for (i, j) in [(i, j) for i in range(len(data)) for j in range(len(min_list))]:

        if (data.iloc[i][col] >min_list[j] ) & (data.iloc[i][col]

Rhosu · Accepted Answer

If I am correctly understanding what you are doing, there are a couple places you could try to vectorize things. See if this speeds things up:

q75s = data.quantile(.75)
q25s = data.quantile(.25)
mins = 2.5*q25s - 1.5*q75s
maxs = 2.5*q75s - 1.5*q25s

newdata = data.copy()
newdata[(data < mins) | (data > maxs)] = None

Pandas data frame

Answers (2)

Related Questions