김지영
김지영

Reputation: 359

pandas conditional statements problems

Hi~ I'm handling my data.

I wanna extract data with conditional statements

Here is my code.

# -*- coding: utf-8 -*-
import pandas as pd
import numpy as np
import os

join_file = r'D:\handling data\complete data\조인\after_join.csv'
pwd = os.getcwd()
os.chdir(os.path.dirname(join_file))
join_data = pd.read_csv(os.path.basename(join_file), sep=',', encoding='utf-8')

print(join_data.head())

enter image description here

join_data['cluster_z'] = 4 # 둘다 하락세           
join_data['cluster_z'][((join_data['cluster_x'] == 3 | join_data['cluster_x'] == 2 | join_data['cluster_x'] == 4 )
                   & (join_data['cluster_y'] == 3 | join_data['cluster_y'] == 1))] = 1 # 다 상승세

join_data['cluster_z'][((join_data['cluster_x'] == 1 | join_data['cluster_x'] == 5)
                   & (join_data['cluster_y'] == 3 | join_data['cluster_y'] == 1))] = 2 # 전체 하락세, 점포당 상승세

join_data['cluster_z'][((join_data['cluster_x'] == 3 | join_data['cluster_x'] == 2 | join_data['cluster_x'] == 4 )
                   & (join_data['cluster_y'] == 2 | join_data['cluster_y'] == 4))] = 3 # 전체 상승세, 점파당 하락세

print(join_data.head())

and after executing second print(join_data.head()). I got the error like picture

enter image description here

How can I fix it?? Thank in advance.

Upvotes: 1

Views: 156

Answers (1)

jezrael
jezrael

Reputation: 862641

It seems you omit a lot of brackets between conditions, also better is use loc:

Original:

join_data['cluster_z']
[((join_data['cluster_x'] == 3 | 
   join_data['cluster_x'] == 2 | 
   join_data['cluster_x'] == 4 ) &
  (join_data['cluster_y'] == 3 |
   join_data['cluster_y'] == 1))] = 1

Change to:

join_data.loc[
((join_data['cluster_x'] == 3) | 
 (join_data['cluster_x'] == 2) | 
 (join_data['cluster_x'] == 4) ) & 
((join_data['cluster_y'] == 3) | 
 (join_data['cluster_y'] == 1)), 'cluster_z'] = 1 

Or better use isin:

join_data.loc[
(join_data['cluster_x'].isin([3,2,4])) & 
(join_data['cluster_y'].isin([3,1])), 'cluster_z'] = 1 

All together:

join_data = pd.DataFrame({'cluster_x':[3,2,5,3],
                         'cluster_y':[3,0,1,2]})

print (join_data)
   cluster_x  cluster_y
0          3          3
1          2          0
2          5          1
3          3          2

join_data['cluster_z'] = 4

join_data.loc[
(join_data['cluster_x'].isin([3,2,4])) & 
(join_data['cluster_y'].isin([3,1])), 'cluster_z'] = 1 

join_data.loc[
(join_data['cluster_x'].isin([1,5])) & 
(join_data['cluster_y'].isin([3,1])), 'cluster_z'] = 2 

join_data.loc[
(join_data['cluster_x'].isin([3,2,4])) & 
(join_data['cluster_y'].isin([2,4])), 'cluster_z'] = 3

print (join_data)
   cluster_x  cluster_y  cluster_z
0          3          3          1
1          2          0          4
2          5          1          2
3          3          2          3

Or more readable:

mask1 = join_data['cluster_x'].isin([3,2,4])
mask2 = join_data['cluster_y'].isin([3,1])
mask3 = join_data['cluster_x'].isin([1,5])
mask4 = join_data['cluster_y'].isin([2,4])

join_data['cluster_z'] = 4
join_data.loc[mask1 & mask2 , 'cluster_z'] = 1 
join_data.loc[mask3 & mask2 , 'cluster_z'] = 2 
join_data.loc[mask1 & mask4 , 'cluster_z'] = 3 

print (join_data)
   cluster_x  cluster_y  cluster_z
0          3          3          1
1          2          0          4
2          5          1          2
3          3          2          3

Solution with multiple numpy.where:

mask1 = join_data['cluster_x'].isin([3,2,4])
mask2 = join_data['cluster_y'].isin([3,1])
mask3 = join_data['cluster_x'].isin([1,5])
mask4 = join_data['cluster_y'].isin([2,4])

join_data['cluster_z'] = np.where(mask1 & mask2, 1,
                         np.where(mask3 & mask2, 2,
                         np.where(mask1 & mask4, 3, 4)))        

print (join_data)
   cluster_x  cluster_y  cluster_z
0          3          3          1
1          2          0          4
2          5          1          2
3          3          2          3

Upvotes: 2

Related Questions