Reputation: 1537
I am a Java programmer and I am learning python for Data Science and Analysis purposes.
I wish to clean the data in a Dataframe, but I am confused with the pandas logic and syntax.
What I wish to achieve is the something like the following Java code:
for( String name : names ) {
if (name == "test") {
name = "myValue";}
}
How can do it with python and pandas dataframe. I tried as following but it does not work
import pandas as pd
import numpy as np
df = pd.read_csv('Dataset V02.csv')
array = df['Order Number'].unique()
#On average, one order how many items has?
for value in array:
count = 0
if df['Order Number'] == value:
......
I get error at df['Order Number']==value. How can I identify the specific values and edit them?
In short, I want to: -Check all the entries of 'Order Number' column -Execute an action (example: replace the value, or count the value) each time the record is equal to a given value (example, the order code)
Upvotes: 1
Views: 99
Reputation: 394279
Just use the vectorised form for replacement:
df.loc[df['Order Number'] == 'test'
This will compare the entire column against a specific value, where this is True
it will replace just those rows with the new value
For the second part if
doesn't understand boolean arrays, it expects a scalar result. If you're just doing a unique value/frequency count then just do:
df['Order Number'].value_counts()
Upvotes: 1
Reputation: 178
The code goes this way
import pandas as pd
df = pd.read_csv("Dataset V02.csv")
array = df['Order Number'].unique()
for value in array:
count = 0
if value in df['Order Number']:
.......
You need to use "in" to check the presence. Did I understand your problem correctly. If I did not, please comment, I will try to understand further.
Upvotes: 0