Reputation: 41
I have this dataframe:
0 nombre 74 non-null object
1 fabricante - 74 non-null - object
2 calorias -74 non-null -int64
3 proteina -74 non-null -int64
4 grasa -74 non-null -int64
5 sodio -74 non-null -int64
6 fibra dietaria -74 non-null -float64
7 carbohidratos -74 non-null -float64
8 azúcar -74 non-null -int64
9 potasio -74 non-null -int64
10 vitaminas y minerales -74 non-null -int64
I am trying to extract information like this:
cereal_df.loc[cereal_df['fabricante'] == 'Kelloggs', 'sodio']
The output is (good, that is what I want to extract in this case right?)
2 260 3 140 6 125 16 290 17 90 19 140 21 220 24 125 25 200 26 0 27 240 37 170 38 170 43 150 45 190 46 220 47 170 50 320 55 210 57 0 59 290 63 70 64 230 Name: sodio, dtype: int64
That is what I need so far, but when I try to write a function like this (in order to get the confidence):
def valor_medio_intervalo(fabricante, variable, confianza):
subconjunto = cereal_df.loc[cereal_df['fabricante'] == fabricante, cereal_df[variable]]
inicio, final = sm.stats.DescrStatsW(subconjunto[variable]).zconfint_mean(alpha = 1 - confianza)
return inicio, final
Then I run the function:
valor_medio_intervalo('Kelloggs', 'azúcar', 0.95)
And the output is:
KeyError Traceback (most recent call last)
<ipython-input-57-11420ac4d15f> in <module>()
1 #TEST_CELL
----> 2 valor_medio_intervalo('Kelloggs', 'azúcar', 0.95)
7 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
1296 if missing == len(indexer):
1297 axis_name = self.obj._get_axis_name(axis)
-> 1298 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
1299
1300 # We (temporarily) allow for some missing keys with .loc, except in
KeyError: "None of [Int64Index([ 6, 8, 5, 0, 8, 10, 14, 8, 6, 5, 12, 1, 9, 7, 13, 3, 2,\n 12, 13, 7, 0, 3, 10, 5, 13, 11, 7, 12, 12, 15, 9, 5, 3, 4,\n 11, 10, 11, 6, 9, 3, 6, 12, 3, 13, 6, 9, 7, 2, 10, 14, 3,\n 0, 0, 6, -1, 12, 8, 6, 2, 3, 0, 0, 0, 15, 3, 5, 3, 14,\n 3, 3, 12, 3, 3, 8],\n dtype='int64')] are in the [columns]"
I do not understand what is going on. I appreciate your help or any hint. Thanks in advance
Upvotes: 1
Views: 132
Reputation: 41
Just got the answer examining the code:
def valor_medio_intervalo(fabricante, variable, confianza):
subconjunto = cereal_df.loc[cereal_df['fabricante'] == fabricante,cereal_df[variable]]
inicio, final = sm.stats.DescrStatsW(subconjunto[variable]).zconfint_mean(alpha = 1 -
confianza)
return inicio, final
in the line
inicio, final = sm.stats.DescrStatsW(subconjunto[variable]).zconfint_mean(alpha = 1 -
confianza)
the
(subconjunto[variable])
is just
(subconjunto)
Upvotes: 1