Reputation: 3786
I am returning information from a DataFrame using the count method this way :
df = pd.DataFrame.from_csv(csv_file)
for i in df['OPTION'].unique():
count = df.loc[df['OPTION'] == i].count
print count
this returns:
DatetimeIndex: 4641 entries, 2014-01-08 02:02:05.740845 to 2014-01-08 02:58:56.405287
Data columns (total 3 columns):
OPTION 4641 non-null values
SELL 4641 non-null values
BUY 4641 non-null values
dtypes: float64(2), object(1)>
Which is the kind of information I'm after, but I would like to access data like the count (4641 in this example) or "non-null values" in my code, not just print them out. How should I access this kind of data ?
Upvotes: 0
Views: 69
Reputation: 11387
Firstly, you are effectively creating groups
of data. So this is better served as following.
grouped = df.groupby('OPTION')
Next, you wish to get the specific groups from this grouped
object. So you iterate over groups, extract the counts (which is basically the length of the index), extract specific columns (for eg. SELL)
for name, group in grouped:
print("Option name: {}".format(name))
# Count of entries for this OPTION
print("Count: {}".format(len(group.index)))
# Accessing specific columns, say SELL
print("SELL for this option\n")
print(group["SELL"])
# Summary for SELL for this option
print("Summary\n")
print(group["SELL"].describe())
A good reference for aggregate-split-combine is the official Pandas docs. Quoting from the same.
By “group by” we are referring to a process involving one or more of the following steps Splitting the data into groups based on some criteria Applying a function to each group independently Combining the results into a data structure
Upvotes: 1