Iterate and change value based on function in Python pandas

Question

please help. Seems easy, just can't figure it out.
DataFrame (df) contains numbers. For each column:
* compute the mean and std
* compute a new value for each value in each row in each column
* change that value with the new value

Method 1

import numpy as np
import pandas as pd
n = 1
while n



 Method 2


    labels = df.columns.values.tolist()
    df2 = df.ix[:,0]
    n = 1
    while n


Error: ValueError: If using all scalar values, you must pass an index


Also tried the .apply method but the new DataFrame doesn't change the values.


print(df.to_json()):
{"col1":{"subj1":4161.97,"subj2":5794.73,"subj3":4740.44,"subj4":4702.84,"subj5":3818.94},"col2":{"subj1":13974.62,"subj2":19635.32,"subj3":17087.721851,"subj4":13770.461021,"subj5":11546.157578},"col3":{"subj1":270.7,"subj2":322.607708,"subj3":293.422314,"subj4":208.644585,"subj5":210.619961},"col4":{"subj1":5400.16,"subj2":7714.080365,"subj3":6023.026011,"subj4":5880.187272,"subj5":4880.056292}}

Jonathan Eunice · Accepted Answer

It looks like you're trying to do operations on DataFrame columns and values as though DataFrames were simple lists or arrays, rather than in the vectorized / column-at-a-time way more usual for NumPy and Pandas work.

A simple, first-pass improvement might be:

# import your data
import json
df = pd.DataFrame(json.loads(json_text))

# loop over only numeric columns
for col in df.select_dtypes([np.number]):
    # compute column mean and std
    col_mean = df[col].mean()
    col_std  = df[col].std()
    # adjust column to normalized values
    df[col] = df[col].apply(lambda x: (x - col_mean) / col_std)

That is vectorized by column. It retains some explicit looping, but is straightforward and relatively beginner-friendly.

If you're comfortable with Pandas, it can done more compactly:

numeric_cols = list(df.select_dtypes([np.number]))
df[numeric_cols] = df[numeric_cols].apply(lambda col: (col - col.mean()) / col.std(), axis=0)

In your revised DataFrame, there are no string columns. But the earlier DataFrame had string columns, causing problems when they were computed upon, so let's be careful. This is a generic way to select numeric columns. If it's too much, you can simplify at the cost of generality by listing them explicitly:

numeric_cols = ['col1', 'col2', 'col3', 'col4']

Iterate and change value based on function in Python pandas

Answers (2)

Related Questions