cookie1986
cookie1986

Reputation: 895

How to standardize values in a Pandas dataframe based on index position?

I have a number of pandas dataframes that each have a column 'speaker', and one of two labels. Typically, this is 0-1, however in some cases it is 1-2, 1-3, or 0-2. I am trying to find a way to iterate through all of my dataframes and standardize them so that they share the same labels (0-1).

The one consistent feature between them is that the first label to appear (i.e. in the first row of the dataframe) should always be mapped to '0', where as the second should always be mapped to '1'.

Here is an example of one of the dataframes I would need to change - being mindful that others will have different labels:

import pandas as pd
data = [1,2,1,2,1,2,1,2,1,2]

df = pd.DataFrame(data, columns = ['speaker'])

I would like to be able to change so that it appears as [0,1,0,1,0,1,0,1,0,1].

Thus far, I have tried inserting the following code within a bigger for loop that iterates through each dataframe. However it is not working at all:

for label in data['speaker']:
    if label == data['speaker'][0]:
        label = '0'
    else:
        label = '1'

Hopefully, what the above makes clear is that I am attempting to create a rule akin to: "find all instances in 'Speaker' that match the label in the first index position and change this to '0'. For all other instances change this to '1'."

Upvotes: 1

Views: 176

Answers (2)

Erfan
Erfan

Reputation: 42896

Method 1

We can use iat + np.where here for conditional creation of your column:

# import numpy as np 

first_val = df['speaker'].iat[0] # same as df['speaker'].iloc[0]

df['speaker'] = np.where(df['speaker'].eq(first_val), 0, 1)
   speaker
0        0
1        1
2        0
3        1
4        0
5        1
6        0
7        1
8        0
9        1

Method 2:

We can also make use of booleans, since we can cast them to integers:

first_val = df['speaker'].iat[0]
df['speaker'] = df['speaker'].ne(first_val).astype(int)
   speaker
0        0
1        1
2        0
3        1
4        0
5        1
6        0
7        1
8        0
9        1

Only if your values are actually 1, 2 we can use floor division:

df['speaker'] = df['speaker'] // 2
# same as: df['speaker'] = df['speaker'].floordiv(2)
   speaker
0        0
1        1
2        0
3        1
4        0
5        1
6        0
7        1
8        0
9        1

Upvotes: 2

Horace
Horace

Reputation: 1054

You can use a iloc to get the value of the first row and the first column, and then a mask to set the values:

zero_map = df["speaker"].iloc[0]
mask_zero = df["speaker"] == zero_map
df.loc[mask_zero] = 0
df.loc[~mask_zero] = 1
print(df)
   speaker
0        0
1        1
2        0
3        1
4        0
5        1
6        0
7        1
8        0
9        1

Upvotes: 1

Related Questions