Reputation: 2937
I have a dataframe df
:
20060930 10.103 NaN 10.103 7.981
20061231 15.915 NaN 15.915 12.686
20070331 3.196 NaN 3.196 2.710
20070630 7.907 NaN 7.907 6.459
Then I want to select rows with certain sequence numbers which indicated in a list, suppose here is [1,3], then left:
20061231 15.915 NaN 15.915 12.686
20070630 7.907 NaN 7.907 6.459
How or what function can do that?
Upvotes: 209
Views: 424123
Reputation: 2096
To get a new DataFrame from filtered indexes:
For my problem, I needed a new dataframe from the indexes. I found a straight-forward way to do this:
iloc_list=[1,2,4,8]
df_new = df.filter(items = iloc_list , axis=0)
You can also filter columns using this. Please see the documentation for details.
Upvotes: 3
Reputation: 13975
Use .iloc
for integer based indexing and .loc
for label based indexing. See below example:
ind_list = [1, 3]
df.iloc[ind_list]
Upvotes: 251
Reputation: 889
What you are trying to do is to filter your dataframe by index. The best way to do that in pandas at the moment is the following:
Single Index
desired_index_list = [1,3]
df[df.index.isin(desired_index_list)]
Multiindex
desired_index_list = [1,3]
index_level_to_filter = 0
df[df.index.get_level_values(index_level_to_filter).isin(desired_index_list)]
Upvotes: 5
Reputation: 959
If index_list
contains your desired indices, you can get the dataframe with the desired rows by doing
index_list = [1,2,3,4,5,6]
df.loc[df.index[index_list]]
This is based on the latest documentation as of March 2021.
Upvotes: 27
Reputation: 2472
There are many ways of solving this problem, and the ones listed above are the most commonly used ways of achieving the solution. I want to add two more ways, just in case someone is looking for an alternative.
index_list = [1,3]
df.take(pos)
#or
df.query('index in @index_list')
Upvotes: 4
Reputation: 28309
you can also use iloc:
df.iloc[[1,3],:]
This will not work if the indexes in your dataframe do not correspond to the order of the rows due to prior computations. In that case use:
df.index.isin([1,3])
... as suggested in other responses.
Upvotes: 153
Reputation: 1517
Another way (although it is a longer code) but it is faster than the above codes. Check it using %timeit function:
df[df.index.isin([1,3])]
PS: You figure out the reason
Upvotes: 120
Reputation: 44615
For large datasets, it is memory efficient to read only selected rows via the skiprows
parameter.
Example
pred = lambda x: x not in [1, 3]
pd.read_csv("data.csv", skiprows=pred, index_col=0, names=...)
This will now return a DataFrame from a file that skips all rows except 1 and 3.
Details
From the docs:
skiprows
: list-like or integer or callable, defaultNone
...
If callable, the callable function will be evaluated against the row indices, returning True if the row should be skipped and False otherwise. An example of a valid callable argument would be
lambda x: x in [0, 2]
This feature works in version pandas 0.20.0+. See also the corresponding issue and a related post.
Upvotes: 6