IcemanBerlin
IcemanBerlin

Reputation: 3427

pandas python how to count the number of records or rows in a dataframe

Obviously new to Pandas. How can i simply count the number of records in a dataframe.

I would have thought some thing as simple as this would do it and i can't seem to even find the answer in searches...probably because it is too simple.

cnt = df.count
print cnt

the above code actually just prints the whole df

Upvotes: 37

Views: 264983

Answers (6)

Mounesh
Mounesh

Reputation: 744

I used pandas library for this. Here is the code

import pandas as pd


name_of_file =  "test.xlsx"
data = pd.read_excel(name_of_file)

required_colum_name = "Post test Number"

print(len(data[required_colum_name]))
# this also works -> data["Post test Number"].count()

Upvotes: 0

Sharhabeel Hamdan
Sharhabeel Hamdan

Reputation: 1549

Simple method to get the records count:

df.count()[0]

Upvotes: 0

user2314737
user2314737

Reputation: 29297

To get the number of rows in a dataframe use:

df.shape[0]

(and df.shape[1] to get the number of columns).

As an alternative you can use

len(df)

or

len(df.index)

(and len(df.columns) for the columns)

shape is more versatile and more convenient than len(), especially for interactive work (just needs to be added at the end), but len is a bit faster (see also this answer).

To avoid: count() because it returns the number of non-NA/null observations over requested axis

len(df.index) is faster

import pandas as pd
import numpy as np

df = pd.DataFrame(np.arange(24).reshape(8, 3),columns=['A', 'B', 'C'])
df['A'][5]=np.nan
df
# Out:
#     A   B   C
# 0   0   1   2
# 1   3   4   5
# 2   6   7   8
# 3   9  10  11
# 4  12  13  14
# 5 NaN  16  17
# 6  18  19  20
# 7  21  22  23

%timeit df.shape[0]
# 100000 loops, best of 3: 4.22 µs per loop

%timeit len(df)
# 100000 loops, best of 3: 2.26 µs per loop

%timeit len(df.index)
# 1000000 loops, best of 3: 1.46 µs per loop

df.__len__ is just a call to len(df.index)

import inspect 
print(inspect.getsource(pd.DataFrame.__len__))
# Out:
#     def __len__(self):
#         """Returns length of info axis, but here we use the index """
#         return len(self.index)

Why you should not use count()

df.count()
# Out:
# A    7
# B    8
# C    8

Upvotes: 52

Surya Chhetri
Surya Chhetri

Reputation: 11568

Simply, row_num = df.shape[0] # gives number of rows, here's the example:

import pandas as pd
import numpy as np

In [322]: df = pd.DataFrame(np.random.randn(5,2), columns=["col_1", "col_2"])

In [323]: df
Out[323]: 
      col_1     col_2
0 -0.894268  1.309041
1 -0.120667 -0.241292
2  0.076168 -1.071099
3  1.387217  0.622877
4 -0.488452  0.317882

In [324]: df.shape
Out[324]: (5, 2)

In [325]: df.shape[0]   ## Gives no. of rows/records
Out[325]: 5

In [326]: df.shape[1]   ## Gives no. of columns
Out[326]: 2

Upvotes: 10

ekta
ekta

Reputation: 1620

The Nan example above misses one piece, which makes it less generic. To do this more "generically" use df['column_name'].value_counts() This will give you the counts of each value in that column.

d=['A','A','A','B','C','C'," " ," "," "," "," ","-1"] # for simplicity

df=pd.DataFrame(d)
df.columns=["col1"]
df["col1"].value_counts() 
      5
A     3
C     2
-1    1
B     1
dtype: int64
"""len(df) give you 12, so we know the rest must be Nan's of some form, while also having a peek into other invalid entries, especially when you might want to ignore them like -1, 0 , "", also"""

Upvotes: 2

tshauck
tshauck

Reputation: 21544

Regards to your question... counting one Field? I decided to make it a question, but I hope it helps...

Say I have the following DataFrame

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.normal(0, 1, (5, 2)), columns=["A", "B"])

You could count a single column by

df.A.count()
#or
df['A'].count()

both evaluate to 5.

The cool thing (or one of many w.r.t. pandas) is that if you have NA values, count takes that into consideration.

So if I did

df['A'][1::2] = np.NAN
df.count()

The result would be

 A    3
 B    5

Upvotes: 27

Related Questions