How do you replace a corresponding row value with NaN?

Question

I have a csv file with two columns. 'group' and 'x'. The value of 'group' is either a 0 or a 1. The value of 'x' is some 3 digit number. I'm trying to calculate the means of subsets of the data. For example, the mean of all the rows in column 'x' that have a 0 in 'group', and the mean of all the rows with a 1 in 'group.' Currently, the 0's in 'group' are being replaced by NaN, but the 'x' value is unchanged so the result is still the total mean instead of the subset.

For a DataFrame, a dict can specify that different values should be replaced in different columns. For example, {'a': 1, 'b': 'z'} looks for the value 1 in column ‘a’ and the value ‘z’ in column ‘b’ and replaces these values with whatever is specified in value. The value parameter should not be None in this case. You can treat this as a special case of passing two lists except that you are specifying the column to search in.

I saw the documentation above but I can't use it since the values in column 'x' are all different. There are 1000 rows. I think it might have to do with axis but I'm still a bit fuzzy on that.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

normalData = pd.read_csv('NormalSample.csv')

normalData = normalData.replace(0, np.nan)

print(normalData.mean())

group	x
1	324
0	102
0	237
1	290

group	x
1	324
NaN	102
NaN	237
1	290

Long Doan · Accepted Answer

As I believe you only have 2 columns, it is convenient to use direct apply like this:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

normalData = pd.read_csv('NormalSample.csv')

normalData[normalData['group'] == 0] = np.nan

print(normalData.mean())

However, based on what I believe you want to calculate, which is mean of all x where group = 0 and mean of all x where group =1, I propose this following:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

normalData = pd.read_csv('NormalSample.csv')
mean_0 = normalData[normalData['group']==0]['x'].mean()
mean_1 = normalData[normalData['group']==1]['x'].mean()

How do you replace a corresponding row value with NaN?

Answers (1)

Related Questions