abuteau
abuteau

Reputation: 7441

NaN results with pandas mean function

I try to have the mean of a row in my Python DataFrame, but I get a NaN return for every row. Why I get this result and how can I solve that ?

Goog key ratios : http://www.gogofile.com/Default.aspx?p=sc&ID=635118193040317500_6234

path = 'GOOG Key Ratios.csv'
#print(open(path).read())
data = pd.read_csv(path, skiprows = 2, names = ['Y0','Y1','Y2','Y3','Y4','Y5','Y6','Y7','Y8','Y9','Y10'], index_col = 0)
noTTM = data.iloc[:,0:10]
print(data.mean(1))
grossMargin = noTTM[2:3]
print(grossMargin.mean(1))

Return:

Gross Margin %   NaN
dtype: float64

Regards,

Upvotes: 7

Views: 23029

Answers (1)

Phillip Cloud
Phillip Cloud

Reputation: 25662

The reason you have a bunch of nan values is because you don't have homogeneous column types. So, for example when you try to average across the columns it doesn't make sense because pandas.read_csv will only convert into a numeric column if it makes sense, e.g., you don't have string dates or other text in the same column as numbers.

I recommend also that you do a simple df.head() to check your data before doing even simple analyses. It will save you a lot of time in the future when you're wondering why your output is "weird".

That said, you can do the following to convert things to numeric values, but this isn't necessarily guaranteed to make sense:

In [35]: df = read_csv('GOOG Key Ratios.csv', skiprows=2, index_col=0, names=['Y%d' % i for i in range(11)])

In [36]: df.head() # not homogeneously typed columns
Out[36]:
                               Y0       Y1       Y2       Y3       Y4  \
NaN                       2003-12  2004-12  2005-12  2006-12  2007-12
Revenue USD Mil             1,466    3,189    6,139   10,605   16,594
Gross Margin %               57.3     54.3     58.1     60.2     59.9
Operating Income USD Mil      342      640    2,017    3,550    5,084
Operating Margin %           23.4     20.1     32.9     33.5     30.6

                               Y5       Y6       Y7       Y8       Y9     Y10
NaN                       2008-12  2009-12  2010-12  2011-12  2012-12     TTM
Revenue USD Mil            21,796   23,651   29,321   37,905   50,175  55,797
Gross Margin %               60.4     62.6     64.5     65.2     58.9    56.7
Operating Income USD Mil    6,632    8,312   10,381   11,742   12,760  12,734
Operating Margin %           30.4     35.1     35.4     31.0     25.4    22.8

In [37]: df.convert_objects(convert_numeric=True).head()
Out[37]:
                             Y0     Y1    Y2    Y3    Y4    Y5    Y6    Y7    Y8    Y9   Y10
NaN                         NaN    NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN
Revenue USD Mil             NaN    NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN
Gross Margin %             57.3   54.3  58.1  60.2  59.9  60.4  62.6  64.5  65.2  58.9  56.7
Operating Income USD Mil  342.0  640.0   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN
Operating Margin %         23.4   20.1  32.9  33.5  30.6  30.4  35.1  35.4  31.0  25.4  22.8

Upvotes: 6

Related Questions