Pete Lucas
Pete Lucas

Reputation: 37

Exclude last two rows when import a csv file using read_csv in Pandas

Afternoon All,

I am extracting data from SQL server to a csv format then reading the file in.

df = pd.read_csv(
                            'TKY_RFQs.csv', 
                            sep='~', 
                            usecols=[
                                     0,1,2,3,4,5,6,7,8,9,
                                     10,11,12,13,14,15,16,17,18,19,
                                     20,21,22,23,24,25,26,27,28,29,
                                     30,31,32,33,34,35,36,37
                                    ]
                )

There is a blank row then the record count at the end of the file which I would like to remove.

End of file screenshot

I have been getting around the issue via this code but would like to resolve the root problem:

# Count_Row=df.shape[0] # gives number of row count
# df_Sample = df[['trading_book','state', 'rfq_num_of_dealers']].head(Count_Row-1)

Is there a way to exclude the last two rows in the file or alternativcely remove any row which has null values for all columns?

Pete

Upvotes: 0

Views: 4417

Answers (2)

edesz
edesz

Reputation: 12406

You can make use of skiprows directly in .read_csv

df = pd.read_csv(
                            'TKY_RFQs.csv', 
                            sep='~', 
                            usecols=[
                                     0,1,2,3,4,5,6,7,8,9,
                                     10,11,12,13,14,15,16,17,18,19,
                                     20,21,22,23,24,25,26,27,28,29,
                                     30,31,32,33,34,35,36,37
                                    ],
                            skiprows=-2 # added this line to skip rows when reading
                )

Upvotes: 2

Omi Harjani
Omi Harjani

Reputation: 840

Could you try :

df = pd.read_csv(
                            'TKY_RFQs.csv', 
                            sep='~', 
                            usecols=[
                                     0,1,2,3,4,5,6,7,8,9,
                                     10,11,12,13,14,15,16,17,18,19,
                                     20,21,22,23,24,25,26,27,28,29,
                                     30,31,32,33,34,35,36,37
                                    ]
                )[:-2]

Example:

from pandas import read_csv
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data"
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
data = read_csv(url, names=names)[:-2] #to exclude last two rows
#data = read_csv(url, names=names) #to include all rows
print data
#description = data.describe()

Upvotes: 2

Related Questions