pandas - show column name + sum in which the sum is higher than zero

Question

I read my dataframe in with:

dataframe = pd.read_csv("testFile.txt", sep = "	", index_col= 0)

I got a dataframe like this:

cell 17472131 17472132 17472133 17472134 17472135 17472136
cell_0 1 0 1 0 1 0
cell_1 0 0 0 0 1 0
cell_2 0 1 1 1 0 0 
cell_3 1 0 0 0 1 0

with pandas I would like to get all the column names in which the sum of the column is > 1 and the total sum. So I would like:

17472131 2
17472133 2
17472135 3

I figured out how to get the sums of each column with

dataframe.sum(axis=0)

but this also returns the columns with a sum lower than 2.. is there a way to only show the columns with a higher value than i.e. 1?

Scott Boston · Accepted Answer

One pretty neat way is to use lambda function in loc:

df.set_index('cell').sum().loc[lambda x: x>1]

Output:

17472131    2
17472133    2
17472135    3
dtype: int64

Details: df.sum returns a pd.Series and we can use lambda x: x>1 to produce as boolean series which loc use boolean indexing to select only True parts of the pd.Series.

pandas - show column name + sum in which the sum is higher than zero

Answers (1)

Related Questions