How to filter a pandas dataframe without headers

Question

I'm trying to filter a larger csv that does not contain any headers. I would like to return a second dataframe that only returns the rows where there is positive values in the last column.

Here is what I'm trying;

input_data = pd.read_csv(infile, delimiter=',').values
print(input_data.shape)  # (832650, 200)
pos_data = input_data.iloc[:, 199] > 0

The last line gives the error: AttributeError: 'numpy.ndarray' object has no attribute 'iloc'

I'm on 0.24.1 of pandas and 1.16.1 of numpy.

Thank you

EDIT: Removing values, gets rid of the error, but I still can't filter the dataframe.

input_data = pd.read_csv(infile, delimiter=',')
print(input_data.shape)  # (832650, 200)
pos_data = input_data.iloc[:, -1] > 0
print(pos_data.shape)  # (832650,)

jezrael · Accepted Answer

Use boolean indexing:

input_data = pd.read_csv(infile)
df = input_data[input_data.iloc[:, -1] > 0]

How to filter a pandas dataframe without headers

Answers (1)

Related Questions