Rhys
Rhys

Reputation: 425

How to filter a pandas dataframe without headers

I'm trying to filter a larger csv that does not contain any headers. I would like to return a second dataframe that only returns the rows where there is positive values in the last column.

Here is what I'm trying;

input_data = pd.read_csv(infile, delimiter=',').values
print(input_data.shape)  # (832650, 200)
pos_data = input_data.iloc[:, 199] > 0

The last line gives the error: AttributeError: 'numpy.ndarray' object has no attribute 'iloc'

I'm on 0.24.1 of pandas and 1.16.1 of numpy.

Thank you

EDIT: Removing values, gets rid of the error, but I still can't filter the dataframe.

input_data = pd.read_csv(infile, delimiter=',')
print(input_data.shape)  # (832650, 200)
pos_data = input_data.iloc[:, -1] > 0
print(pos_data.shape)  # (832650,)

Upvotes: 3

Views: 5641

Answers (1)

jezrael
jezrael

Reputation: 862851

Use boolean indexing:

input_data = pd.read_csv(infile)
df = input_data[input_data.iloc[:, -1] > 0]

Upvotes: 4

Related Questions