Reputation: 51
I have a csv file with two columns. 'group' and 'x'. The value of 'group' is either a 0 or a 1. The value of 'x' is some 3 digit number. I'm trying to calculate the means of subsets of the data. For example, the mean of all the rows in column 'x' that have a 0 in 'group', and the mean of all the rows with a 1 in 'group.' Currently, the 0's in 'group' are being replaced by NaN, but the 'x' value is unchanged so the result is still the total mean instead of the subset.
For a DataFrame, a dict can specify that different values should be replaced in different columns. For example, {'a': 1, 'b': 'z'} looks for the value 1 in column ‘a’ and the value ‘z’ in column ‘b’ and replaces these values with whatever is specified in value. The value parameter should not be None in this case. You can treat this as a special case of passing two lists except that you are specifying the column to search in.
I saw the documentation above but I can't use it since the values in column 'x' are all different. There are 1000 rows. I think it might have to do with axis but I'm still a bit fuzzy on that.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
normalData = pd.read_csv('NormalSample.csv')
normalData = normalData.replace(0, np.nan)
print(normalData.mean())
group | x |
---|---|
1 | 324 |
0 | 102 |
0 | 237 |
1 | 290 |
group | x |
---|---|
1 | 324 |
NaN | 102 |
NaN | 237 |
1 | 290 |
Upvotes: 1
Views: 106
Reputation: 313
As I believe you only have 2 columns, it is convenient to use direct apply like this:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
normalData = pd.read_csv('NormalSample.csv')
normalData[normalData['group'] == 0] = np.nan
print(normalData.mean())
However, based on what I believe you want to calculate, which is mean of all x where group = 0 and mean of all x where group =1, I propose this following:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
normalData = pd.read_csv('NormalSample.csv')
mean_0 = normalData[normalData['group']==0]['x'].mean()
mean_1 = normalData[normalData['group']==1]['x'].mean()
Upvotes: 1