Reputation: 2219
The best version of a question that relates to my question is found here. But I'm running into a hiccup somewhere.
My dataframe:
df = pd.DataFrame({'KEY': ['100000003', '100000009', '100000009', '100000009'],
'RO_1': [1, 1, 4,1],
'RO_2': [1, 0, 0,0],
'RO_3': [1, 1, 1,1],
'RO_4': [1, 4, 1,1]})
KEY RO_1 RO_2 RO_3 RO_4
0 100000003 1 1 1 1
1 100000009 1 0 1 4
2 100000009 4 0 1 1
3 100000009 1 0 1 1
I want to create 3 addition columns labeled 'Month1', 'Month2', to 'Month4'. Something simple like:
for i in range(3):
df.loc[1,'Month'+str(i)] = 1 # '1' is just there as a place holder
Although I'm getting a warning message when I execute this code:
"A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead"
I want to combine this with conditionals to fill in each cell for each column and each row.
The code below will create one one column and flag based on the condition if any column with RO_ has either condition
namelist = df.columns.get_values().tolist()
ROList = [s for s in namelist if "RO_" in s]
for col in ROList:
for i in range(3):
df['Month'] = np.where(np.logical_or(df[col]==4,df[col]==1), '1', '0')
df
I treid combining the two codes but I am missing a fundamental understanding of how to do this. Any help would be great.
Final expected result:
KEY RO_1 RO_2 RO_3 RO_4 Month1 Month2 Month3 Month4
0 100000003 1 1 1 1 1 1 1 1
1 100000009 1 0 1 4 1 0 1 1
2 100000009 4 0 1 1 1 0 1 1
3 100000009 1 0 1 1 1 0 1 1
Upvotes: 0
Views: 1420
Reputation: 323226
IIUC enumerate
namelist = df.columns.get_values().tolist()
ROList = [s for s in namelist if "RO_" in s]
for i,col in enumerate(ROList):
df['Month'+str(i+1)] = np.where(np.logical_or(df[col]==4,df[col]==1), '1', '0')
df
Out[194]:
KEY RO_1 RO_2 RO_3 RO_4 Month1 Month2 Month3 Month4
0 100000003 1 1 1 1 1 1 1 1
1 100000009 1 0 1 4 1 0 1 1
2 100000009 4 0 1 1 1 0 1 1
3 100000009 1 0 1 1 1 0 1 1
Your logic seems like change 4 to 1
df.assign(**df.loc[:,ROList].mask(df.loc[:,ROList]==4,1).rename(columns=dict(zip(ROList,list(range(1,len(ROList)+1))))).add_prefix('Month'))
Out[15]:
KEY RO_1 RO_2 RO_3 RO_4 Month1 Month2 Month3 Month4
0 100000003 1 1 1 1 1 1 1 1
1 100000009 1 0 1 4 1 0 1 1
2 100000009 4 0 1 1 1 0 1 1
3 100000009 1 0 1 1 1 0 1 1
Upvotes: 2
Reputation: 402463
Use filter
+ isin
+ rename
, for a single pipelined transformation of your data.
v = (df.filter(regex='^RO_') # select columns
.isin([4, 1]) # check if the value is 4 or 1
.astype(int) # convert the `bool` result to `int`
.rename( # rename columns
columns=lambda x: x.replace('RO_', 'Month')
))
Or, for the sake of performance,
v = df.filter(regex='^RO_')\
.isin([4, 1])\
.astype(int)
v.columns = v.columns.str.replace('RO_', 'Month')
Finally, concat
enate the result with the original.
pd.concat([df, v], axis=1)
KEY RO_1 RO_2 RO_3 RO_4 Month1 Month2 Month3 Month4
0 100000003 1 1 1 1 1 1 1 1
1 100000009 1 0 1 4 1 0 1 1
2 100000009 4 0 1 1 1 0 1 1
3 100000009 1 0 1 1 1 0 1 1
Upvotes: 2
Reputation: 9018
Seems like you are creating a new column for each existing column in your dataframe. You can do something like:
original_cols = df.columns
for c in original_cols:
cname = "Month" + c.split("_")[-1]
df[cname] = df[c].apply(lambda x: 1 if (x == 1) or (x == 4) else 0)
Upvotes: 0