Accessing data from a count method in Pandas

Question

I am returning information from a DataFrame using the count method this way :

df = pd.DataFrame.from_csv(csv_file)

for i in df['OPTION'].unique():
   count = df.loc[df['OPTION'] == i].count
   print count

this returns:

DatetimeIndex: 4641 entries, 2014-01-08 02:02:05.740845 to 2014-01-08 02:58:56.405287

Data columns (total 3 columns):

OPTION 4641 non-null values

SELL 4641 non-null values

BUY 4641 non-null values

dtypes: float64(2), object(1)>

Which is the kind of information I'm after, but I would like to access data like the count (4641 in this example) or "non-null values" in my code, not just print them out. How should I access this kind of data ?

Nipun Batra · Accepted Answer

Firstly, you are effectively creating groups of data. So this is better served as following.

grouped = df.groupby('OPTION')

Next, you wish to get the specific groups from this grouped object. So you iterate over groups, extract the counts (which is basically the length of the index), extract specific columns (for eg. SELL)

for name, group in grouped:
    print("Option name: {}".format(name))
    # Count of entries for this OPTION
    print("Count: {}".format(len(group.index)))
    # Accessing specific columns, say SELL
    print("SELL for this option
")
    print(group["SELL"])
    # Summary for SELL for this option
    print("Summary
")
    print(group["SELL"].describe())

A good reference for aggregate-split-combine is the official Pandas docs. Quoting from the same.

By “group by” we are referring to a process involving one or more of the following steps
Splitting the data into groups based on some criteria
Applying a function to each group independently
Combining the results into a data structure

Accessing data from a count method in Pandas

Answers (1)

Related Questions