vladimir
vladimir

Reputation: 41

Extract information to work with with pandas

I have this dataframe:

Column Non-Null Dtype

0 nombre 74 non-null object

1 fabricante - 74 non-null - object

2 calorias -74 non-null -int64
3 proteina -74 non-null -int64
4 grasa -74 non-null -int64
5 sodio -74 non-null -int64
6 fibra dietaria -74 non-null -float64 7 carbohidratos -74 non-null -float64 8 azúcar -74 non-null -int64
9 potasio -74 non-null -int64
10 vitaminas y minerales -74 non-null -int64

I am trying to extract information like this:

cereal_df.loc[cereal_df['fabricante'] == 'Kelloggs', 'sodio']

The output is (good, that is what I want to extract in this case right?)

2 260 3 140 6 125 16 290 17 90 19 140 21 220 24 125 25 200 26 0 27 240 37 170 38 170 43 150 45 190 46 220 47 170 50 320 55 210 57 0 59 290 63 70 64 230 Name: sodio, dtype: int64

That is what I need so far, but when I try to write a function like this (in order to get the confidence):

def valor_medio_intervalo(fabricante, variable, confianza):
   subconjunto = cereal_df.loc[cereal_df['fabricante'] == fabricante, cereal_df[variable]]
   inicio, final  = sm.stats.DescrStatsW(subconjunto[variable]).zconfint_mean(alpha = 1 - confianza) 
   return inicio, final

Then I run the function:

valor_medio_intervalo('Kelloggs', 'azúcar', 0.95)

And the output is:


KeyError                                  Traceback (most recent call last)
<ipython-input-57-11420ac4d15f> in <module>()
      1 #TEST_CELL
----> 2 valor_medio_intervalo('Kelloggs', 'azúcar', 0.95)

7 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
   1296             if missing == len(indexer):
   1297                 axis_name = self.obj._get_axis_name(axis)
-> 1298                 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
   1299 
   1300             # We (temporarily) allow for some missing keys with .loc, except in

KeyError: "None of [Int64Index([ 6,  8,  5,  0,  8, 10, 14,  8,  6,  5, 12,  1,  9,  7, 13,  3,  2,\n            12, 13,  7,  0,  3, 10,  5, 13, 11,  7, 12, 12, 15,  9,  5,  3,  4,\n            11, 10, 11,  6,  9,  3,  6, 12,  3, 13,  6,  9,  7,  2, 10, 14,  3,\n             0,  0,  6, -1, 12,  8,  6,  2,  3,  0,  0,  0, 15,  3,  5,  3, 14,\n             3,  3, 12,  3,  3,  8],\n           dtype='int64')] are in the [columns]"

I do not understand what is going on. I appreciate your help or any hint. Thanks in advance

Upvotes: 1

Views: 132

Answers (1)

vladimir
vladimir

Reputation: 41

Just got the answer examining the code:

def valor_medio_intervalo(fabricante, variable, confianza):
 subconjunto = cereal_df.loc[cereal_df['fabricante'] == fabricante,cereal_df[variable]]
 inicio, final  = sm.stats.DescrStatsW(subconjunto[variable]).zconfint_mean(alpha = 1 - 
    confianza) 
 return inicio, final

in the line

inicio, final  = sm.stats.DescrStatsW(subconjunto[variable]).zconfint_mean(alpha = 1 - 
    confianza)

the

(subconjunto[variable])

is just

(subconjunto)

Upvotes: 1

Related Questions