Reputation: 65
I have a pandas dataframe that I need to pull specific rows out of and into a new dataframe. These rows are in a list that look something like this:[42 50 52 59 60 62]
I am creating the dataframe from a .csv file but as far as I can tell there is not a way to designate the row numbers when reading the .csv and creating the dataframe.
import pandas as pd
df = pd.read_csv('/Users/uni/Desktop/corrindex+id/rt35',index_col = False, header = None )
Here's a portion of the dataframe:
0
0 1 269 245 44 5
1 2 293 393 33 5
2 3 295 175 67 12
3 4 298 415 33 5
4 5 304 392 213 11
Upvotes: 0
Views: 1941
Reputation: 29742
Use skiprows
with a callable:
import pandas as pd
keep_rows = [42 50 52 59 60 62]
df = pd.read_csv('/Users/uni/Desktop/corrindex+id/rt35',
header=None
skiprows=lambda x: x not in keep_rows)
Upvotes: 4
Reputation: 149085
Unfortunately, pandas read_cvs
expects a true file, and not a mere line generator, so it is not easy to select only a bunch of lines. But you can to that at Python level easily:
lines = [line for i, line in enumerate(open('/Users/uni/Desktop/corrindex+id/rt35'), 1)
if i in [42 50 52 59 60 62]]
df = pd.read_csv(io.StringIO(''.join(lines)),index_col = False, header = None )
You can also use skiprows
to ignore all the lines except the ones to keep:
df = pd.read_csv('/Users/uni/Desktop/corrindex+id/rt35',index_col = False,
header = None, skiprows=lambda x: x not in [42 50 52 59 60 62])
Upvotes: 1
Reputation: 27879
You can go about it like this:
import pandas as pd
my_list = [42, 50, 52, 59, 60, 62]
df = pd.read_csv('/Users/uni/Desktop/corrindex+id/rt35',
index_col= False,
header=None,
nrows=max(my_list) + 1).iloc[mylist]
Upvotes: 0