Python Dataframe

Question

I am a Java programmer and I am learning python for Data Science and Analysis purposes.

I wish to clean the data in a Dataframe, but I am confused with the pandas logic and syntax.

What I wish to achieve is the something like the following Java code:

for( String name : names ) {
     if (name == "test") {
       name = "myValue";}
  }

How can do it with python and pandas dataframe. I tried as following but it does not work

import pandas as pd
import numpy as np

df = pd.read_csv('Dataset V02.csv')

array = df['Order Number'].unique()

#On average, one order how many items has?

for value in array:
    count = 0
    if df['Order Number'] == value:
        ......

I get error at df['Order Number']==value. How can I identify the specific values and edit them?

In short, I want to: -Check all the entries of 'Order Number' column -Execute an action (example: replace the value, or count the value) each time the record is equal to a given value (example, the order code)

EdChum · Accepted Answer

Just use the vectorised form for replacement:

df.loc[df['Order Number'] == 'test'

This will compare the entire column against a specific value, where this is True it will replace just those rows with the new value

For the second part if doesn't understand boolean arrays, it expects a scalar result. If you're just doing a unique value/frequency count then just do:

df['Order Number'].value_counts()

Python Dataframe

Answers (2)

Related Questions