Reputation: 359
Hi~ I'm handling my data.
I wanna extract data with conditional statements
Here is my code.
# -*- coding: utf-8 -*-
import pandas as pd
import numpy as np
import os
join_file = r'D:\handling data\complete data\조인\after_join.csv'
pwd = os.getcwd()
os.chdir(os.path.dirname(join_file))
join_data = pd.read_csv(os.path.basename(join_file), sep=',', encoding='utf-8')
print(join_data.head())
join_data['cluster_z'] = 4 # 둘다 하락세
join_data['cluster_z'][((join_data['cluster_x'] == 3 | join_data['cluster_x'] == 2 | join_data['cluster_x'] == 4 )
& (join_data['cluster_y'] == 3 | join_data['cluster_y'] == 1))] = 1 # 다 상승세
join_data['cluster_z'][((join_data['cluster_x'] == 1 | join_data['cluster_x'] == 5)
& (join_data['cluster_y'] == 3 | join_data['cluster_y'] == 1))] = 2 # 전체 하락세, 점포당 상승세
join_data['cluster_z'][((join_data['cluster_x'] == 3 | join_data['cluster_x'] == 2 | join_data['cluster_x'] == 4 )
& (join_data['cluster_y'] == 2 | join_data['cluster_y'] == 4))] = 3 # 전체 상승세, 점파당 하락세
print(join_data.head())
and after executing second print(join_data.head()). I got the error like picture
How can I fix it?? Thank in advance.
Upvotes: 1
Views: 156
Reputation: 862641
It seems you omit a lot of brackets between conditions, also better is use loc
:
Original:
join_data['cluster_z']
[((join_data['cluster_x'] == 3 |
join_data['cluster_x'] == 2 |
join_data['cluster_x'] == 4 ) &
(join_data['cluster_y'] == 3 |
join_data['cluster_y'] == 1))] = 1
Change to:
join_data.loc[
((join_data['cluster_x'] == 3) |
(join_data['cluster_x'] == 2) |
(join_data['cluster_x'] == 4) ) &
((join_data['cluster_y'] == 3) |
(join_data['cluster_y'] == 1)), 'cluster_z'] = 1
Or better use isin
:
join_data.loc[
(join_data['cluster_x'].isin([3,2,4])) &
(join_data['cluster_y'].isin([3,1])), 'cluster_z'] = 1
All together:
join_data = pd.DataFrame({'cluster_x':[3,2,5,3],
'cluster_y':[3,0,1,2]})
print (join_data)
cluster_x cluster_y
0 3 3
1 2 0
2 5 1
3 3 2
join_data['cluster_z'] = 4
join_data.loc[
(join_data['cluster_x'].isin([3,2,4])) &
(join_data['cluster_y'].isin([3,1])), 'cluster_z'] = 1
join_data.loc[
(join_data['cluster_x'].isin([1,5])) &
(join_data['cluster_y'].isin([3,1])), 'cluster_z'] = 2
join_data.loc[
(join_data['cluster_x'].isin([3,2,4])) &
(join_data['cluster_y'].isin([2,4])), 'cluster_z'] = 3
print (join_data)
cluster_x cluster_y cluster_z
0 3 3 1
1 2 0 4
2 5 1 2
3 3 2 3
Or more readable:
mask1 = join_data['cluster_x'].isin([3,2,4])
mask2 = join_data['cluster_y'].isin([3,1])
mask3 = join_data['cluster_x'].isin([1,5])
mask4 = join_data['cluster_y'].isin([2,4])
join_data['cluster_z'] = 4
join_data.loc[mask1 & mask2 , 'cluster_z'] = 1
join_data.loc[mask3 & mask2 , 'cluster_z'] = 2
join_data.loc[mask1 & mask4 , 'cluster_z'] = 3
print (join_data)
cluster_x cluster_y cluster_z
0 3 3 1
1 2 0 4
2 5 1 2
3 3 2 3
Solution with multiple numpy.where
:
mask1 = join_data['cluster_x'].isin([3,2,4])
mask2 = join_data['cluster_y'].isin([3,1])
mask3 = join_data['cluster_x'].isin([1,5])
mask4 = join_data['cluster_y'].isin([2,4])
join_data['cluster_z'] = np.where(mask1 & mask2, 1,
np.where(mask3 & mask2, 2,
np.where(mask1 & mask4, 3, 4)))
print (join_data)
cluster_x cluster_y cluster_z
0 3 3 1
1 2 0 4
2 5 1 2
3 3 2 3
Upvotes: 2