kcm2174
kcm2174

Reputation: 169

Difficulty replacing values in Pandas column

First time posting here, pretty new to Python and having some difficulty rewriting a value in a pandas column. I have a data frame with some columns (among them 'Ba Avg', 'Study', 'Latitude', 'Upper Depth', etc). I'm trying to average some of the values and write them to a new column called 'Upper Avg'.

At the beginning I create a new column called 'Upper Avg' and save the index of the df as a list.

Next, for the rows of the data frame from 'Study' 1020, I make their 'Upper Avg' the same as their 'Ba Avg'. This, though probably not very efficient, works totally fine.

Next I want to deal with study 191 but I want their 'Upper Avg' to be an average of every 'Ba Avg' in the study which is in the same location ('Latitude') and has a value of a, b, or c in their 'ConstraintCol'. To do this, I made a set of the Latitude values and then loop through the values individually looking for values with that latitude (no latitude is repeated over multiple studies), making a new df with those values. I then make a df of those which fulfill the constraint called "ranges". I save the index of ranges to a list and then do the average I desire, saving the value as 'avg'.

The problem here is when I try to put this value, avg, into df. I've tried using loc and using replace and every time it replaces but doesn't save the value when I look at df later.

Sorry for the lengthy post but would appreciate any guidance!

df['Upper Avg'] = ""
index = df.index.tolist()

for i in index:
    if df['Study'][i] == 1020: 
        df['Upper Avg'][i] = df['Ba Avg'][i]

lats = set(df[df['Study'] == 191]['Latitude'])
for i in lats: 
    latset = df[df['Latitude'] == i]
    constraint = [a,b,c]
    ranges = latset[latset['ConstraintCol'].isin(constraint)]
    idx = ranges.index.tolist()
    avg = ranges['Ba Avg'].sum() / 3
    df.loc[idx]['Upper Avg'] = avg

Upvotes: 3

Views: 154

Answers (2)

Siddhartha Gandhi
Siddhartha Gandhi

Reputation: 317

When you use .loc, .ix, and .iloc, you will probably (correctly) get a warning about setting a value on a copy if you use the syntax:

df.loc[index]["column"]

The way to correctly do this is like this:

df.loc[index, "column"]

Note the difference. If you create a view of something that is already a view, it tends to be separate from the original data frame, and hence any edits you make to this view-of-a-view do not persist.

What you are doing is called chained indexing, and generally, does not work. See link below. The easiest way to understand this, is that if you are calling one operating on the dataframe (.loc, .ix, .iloc), then your changes will be on the original copy of the dataframe, and will be permanent. If you are doing multiple operations, for instance df[col][index], then it will be very likely you are editing a copy (and not the original dataframe).

http://pandas.pydata.org/pandas-docs/stable/indexing.html#returning-a-view-versus-a-copy

Upvotes: 1

Merlin
Merlin

Reputation: 25659

Start with this. Here is np.where outcome = np.where(condition,true,false)

#df['Upper Avg'] = ""
#index = df.index.tolist()

# for i in index:
#     if df['Study'][i] == 1020: 
#         df['Upper Avg'][i] = df['Ba Avg'][i]

df['Upper Avg'] = np.where(df['Study'] == 1020, df['Ba Avg'], np.nan)        


lats           = set(df[df['Study'] == 191]['Latitude'])
for i in lats: 
    latset     = df[df['Latitude'] == i]
    constraint = [a,b,c]
    ranges     = latset[latset['ConstraintCol'].isin(constraint)]
    idx        = ranges.index.tolist()
    avg        = ranges['Ba Avg'].sum() / 3
    df.loc[idx]['Upper Avg'] = avg

Upvotes: 1

Related Questions