How to select columns based on value they contain pandas

Question

I am working in pandas with a certain dataset that describes the population of a certain country per year. The dataset is construed in a weird way wherein the years aren't the columns themselves but rather the years are a value within the first row of the set. The dataset describes every year from 1960 up til now but I only need 1970, 1980, 1990 etc. For this purpose I've created a list with all those years and tried to make a new dataset which is equivalent to the old one but only has the columns that contain a value from said list so I don't have all this extra info I'm not using. Online I can only find instructions for removing rows or selecting by column name, since both these criteria don't apply in this situation I thought i should ask here. The dataset is a csv file which I've downloaded off some world population site. here a link to a screenshot of the data

As you can see the years are given in scientific notation for some years, which is also how I've added them to my list.

pop = pd.read_csv('./maps/API_SP.POP.TOTL_DS2_en_csv_v2_10576638.csv', 

header=None, engine='python', skiprows=4)
display(pop)
years = ['1.970000e+03','1.980000e+03','1.990000e+03','2.000000e+03','2.010000e+03','2.015000e+03', 'Country Name']
pop[pop.columns[pop.isin(years).any()]]

This is one of the things I've tried so far which I thought made the most sense, but I am still very new to pandas so any help would be greatly appreciated.

How to select columns based on value they contain pandas

Answers (1)

Related Questions