mattp
mattp

Reputation: 65

Reading specific rows out of a panda dataframe using a list

I have a pandas dataframe that I need to pull specific rows out of and into a new dataframe. These rows are in a list that look something like this:[42 50 52 59 60 62]

I am creating the dataframe from a .csv file but as far as I can tell there is not a way to designate the row numbers when reading the .csv and creating the dataframe.

import pandas as pd 

df = pd.read_csv('/Users/uni/Desktop/corrindex+id/rt35',index_col = False, header = None )

Here's a portion of the dataframe:

                    0
0      1 269 245 44 5
1      2 293 393 33 5
2     3 295 175 67 12
3      4 298 415 33 5
4    5 304 392 213 11

Upvotes: 0

Views: 1941

Answers (3)

Chris
Chris

Reputation: 29742

Use skiprows with a callable:

import pandas as pd

keep_rows = [42 50 52 59 60 62]

df = pd.read_csv('/Users/uni/Desktop/corrindex+id/rt35', 
                 header=None
                 skiprows=lambda x: x not in keep_rows)

Upvotes: 4

Serge Ballesta
Serge Ballesta

Reputation: 149085

Unfortunately, pandas read_cvs expects a true file, and not a mere line generator, so it is not easy to select only a bunch of lines. But you can to that at Python level easily:

lines = [line for i, line in enumerate(open('/Users/uni/Desktop/corrindex+id/rt35'), 1)
         if i in [42 50 52 59 60 62]]
df = pd.read_csv(io.StringIO(''.join(lines)),index_col = False, header = None )

You can also use skiprows to ignore all the lines except the ones to keep:

df = pd.read_csv('/Users/uni/Desktop/corrindex+id/rt35',index_col = False, 
                 header = None, skiprows=lambda x: x not in [42 50 52 59 60 62])

Upvotes: 1

zipa
zipa

Reputation: 27879

You can go about it like this:

import pandas as pd

my_list = [42, 50, 52, 59, 60, 62] 

df = pd.read_csv('/Users/uni/Desktop/corrindex+id/rt35',
                 index_col= False,
                 header=None,
                 nrows=max(my_list) + 1).iloc[mylist]

Upvotes: 0

Related Questions