flmlopes
flmlopes

Reputation: 62

pandas dataframe does not recognize columns

So I am trying to add a column to a dataframe and use another column to calculate its value.

import pandas as pd
import pandasql as pdsql
import csv

def filter_by_regular(filename):
    turnstile_data = pd.read_csv(filename)
    turnstile_data = pd.DataFrame(turnstile_data)
    q = "SELECT * FROM turnstile_data WHERE 'DESCn == REGULAR';"
    return turnstile_data

turnstile_regular = filter_by_regular('master_file.txt')
turnstile_regular.head()

enter image description here

turnstile_regular.columns

Index([u'C/A', u' UNIT', u' SCP', u' DATEn', u' TIMEn', u' DESCn',
   u' ENTRIESn', u' EXITSn'],
  dtype='object')

Then when i try to access the ENTRIESn column to use its value to add another column, python does not recognize it.

import pandas

def get_hourly_entries(df):
    df['ENTRIESn_hourly'] = df.ENTRIESn.diff(1)
    df.ENTRIESn_hourly.fillna(1, inplace = True)
    return df

turnstile_hourly = get_hourly_entries(turnstile_regular)
turnstile_hourly.head()

    AttributeError                            Traceback (most recent call last)
<ipython-input-70-890cc0bc29bd> in <module>()
      6     return df
      7 
----> 8 turnstile_hourly = get_hourly_entries(turnstile_regular)
      9 turnstile_hourly.head()

<ipython-input-70-890cc0bc29bd> in get_hourly_entries(df)
      2 
      3 def get_hourly_entries(df):
----> 4     df['ENTRIESn_hourly'] = df.ENTRIESn.diff(1)
      5     df.ENTRIESn_hourly.fillna(1, inplace = True)
      6     return df

/Users/flmlopes/anaconda3/envs/py2/lib/python2.7/site-packages/pandas/core/generic.pyc in __getattr__(self, name)
   3079             if name in self._info_axis:
   3080                 return self[name]
-> 3081             return object.__getattribute__(self, name)
   3082 
   3083     def __setattr__(self, name, value):

AttributeError: 'DataFrame' object has no attribute 'ENTRIESn'

So anyone know how can i solve this?

Upvotes: 1

Views: 6567

Answers (1)

Mike M&#252;ller
Mike M&#252;ller

Reputation: 85432

This is your index:

Index([u'C/A', u' UNIT', u' SCP', u' DATEn', u' TIMEn', u' DESCn',
   u' ENTRIESn', u' EXITSn'],
  dtype='object')

Note the leading space:

 u' ENTRIESn'

Therefore, change:

df['ENTRIESn_hourly'] = df.ENTRIESn.diff(1)

to:

df['ENTRIESn_hourly'] = df[u' ENTRIESn'].diff(1)

Alternatively, fix you columns first:

turnstile_regular.columns = [x.strip() for x in turnstile_regular.columns]

Upvotes: 2

Related Questions