Rohit
Rohit

Reputation: 6020

Condition in Pandas

I have a very peculiar problem in Pandas: one condition works but the other does not. You may download the linked file to test my code. Thanks!

I have a file (stars.txt) that I read in with Pandas. I would like to create two groups: (1) with Log_g < 4.0 and (2) Log_g > 4.0. In my code (see below) I can successfully get rows for group (1):

    Kepler_ID            RA           Dec   Teff  Log_G       g       H
3     2305372  19 27 57.679  +37 40 21.90   5664  3.974  14.341  12.201
14    2708156  19 21 08.906  +37 56 11.44  11061  3.717  10.672  10.525
19    2997455  19 32 31.296  +38 07 40.04   4795  3.167  14.694  11.500
34    3352751  19 36 17.249  +38 25 36.91   7909  3.791  13.541  12.304
36    3440230  19 21 53.100  +38 31 42.82   7869  3.657  13.706  12.486

But for some reason I cannot get (2). The code returns the following for error:

<class 'pandas.core.frame.DataFrame'>
Int64Index: 90 entries, 0 to 108
Data columns (total 7 columns):
Kepler_ID    90  non-null values
RA           90  non-null values
Dec          90  non-null values
Teff         90  non-null values
Log_G        90  non-null values
g            90  non-null values
H            90  non-null values
dtypes: float64(4), int64(1), object(2)

Here's my code:

#------------------------------------------
# IMPORT STATEMENTS 
#------------------------------------------
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
#------------------------------------------
# READ FILE AND ASSOCIATE COMPONENTS 
#------------------------------------------
star_file = 'stars.txt'
header_row = ['Kepler_ID', 'RA','Dec','Teff', 'Log_G', 'g', 'H']
df = pd.read_csv(star_file, names=header_row, skiprows=2)
#------------------------------------------
# ASSOCIATE VARIABLES 
#------------------------------------------
Kepler_ID  = df['Kepler_ID']
#RA         = df['RA']         
#Dec        = df['Dec']
Teff       = df['Teff']
Log_G      = df['Log_G']
g          = df['g']
H          = df['H']
#------------------------------------------
# SUBSTITUTE MISSING DATA WITH NAN 
#------------------------------------------ 
df = df.replace('', np.nan)
#------------------------------------------
# CHANGE DATA TYPE OF THE REST OF DATA TO FLOAT 
#------------------------------------------ 
df[['Teff', 'Log_G', 'g', 'H']] = df[['Teff', 'Log_G', 'g', 'H']].astype(float)
#------------------------------------------
# SORTING SPECTRA TYPES FOR GIANTS  
#------------------------------------------
# FIND GIANTS IN THE SAMPLE 
giants = df[(df['Log_G'] < 4.)]
#print giants
# FIND GIANTS IN THE SAMPLE 
dwarfs = df[(df['Log_G'] > 4.)]
print dwarfs

Upvotes: 1

Views: 169

Answers (1)

Andy Hayden
Andy Hayden

Reputation: 375445

This is not an error. You are seeing a summarized view of the DataFrame:

In [11]: df = pd.DataFrame([[2, 1], [3, 4]])

In [12]: df
Out[12]: 
   0  1
0  2  1
1  3  4

In [13]: df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 0 to 1
Data columns (total 2 columns):
0    2  non-null values
1    2  non-null values
dtypes: int64(2)

Which is displayed is decided by several display package options, for example, max_rows:

In [14]: pd.options.display.max_rows
Out[14]: 60

In [15]: pd.options.display.max_rows = 120

In 0.13, this behaviour changed, so you will see the first max_rows followed by ....

Upvotes: 2

Related Questions