Reputation: 59
I have a use case where I have say 10 cols out of which 5 start with the string 'Region'. I need to get a resulting dataframe which only contains those cols (starting with string 'Region'). Not only that, I need to make sure the order is preserved (e.g. if in original df, the col order is 'Region 1', 'Region 2', 'Region 3'
-- this should be preserved and not result in 'Region 3', 'Region 2', 'Region 1'
instead).
Would following the 'accepted answer' for this question preserve the order or is there some other method to achieve that?
stackoverflow - find-column-whose-name-contains-a-specific-string
Upvotes: 0
Views: 446
Reputation: 23099
if your data frame is similar to :
print(df)
Region 3 Region 2 Region 1 Custom UnwantedCol
0 0 0 0 0 0
we can use the sorted
method to sort your columns by the number:
nat_cols_sort = dict(sorted(
{col: int(col.split(" ")[1]) for col in df.filter(regex='^Region').columns}.items(),
key=lambda x: x[1],
))
print(df[nat_sort.keys()])
Region 1 Region 2 Region 3
0 0 0 0
Upvotes: 1
Reputation: 976
Yes, it will. df.columns
is a list, when you iterate over list, you preserve the order of the list. Thus, you can use the answer from the mentioned link:
region_cols = [col for col in df.columns if 'Region' in col]
df[region_cols]
- will be the df you require.
Upvotes: 2