Reputation: 33
I have data like
ca ca ca 120.00
ca cc cd 130.00
ca ca ca 135.23
ca ha ca 60.00
ca ha ca 50.00
If first 3 columns are equal then fourth column value should be the first occurrence. I want data like
ca ca ca 120.00
ca cc cd 130.00
ca ca ca 120.00
ca ha ca 60.00
ca ha ca 60.00
Please help me to solve this
Upvotes: 3
Views: 316
Reputation: 862581
Use GroupBy.transform
with GroupBy.first
Dynamic solution with selecting first 3 columns to list and processing 4th column assigned back:
df.iloc[:, 3] = df.groupby(df.columns[:3].tolist())[df.columns[3]].transform('first')
print (df)
0 1 2 3
0 ca ca ca 120.0
1 ca cc cd 130.0
2 ca ca ca 120.0
3 ca ha ca 60.0
4 ca ha ca 60.0
If there are 4 columns names like a,b,c,d
solution is simplier:
df['d'] = df.groupby(['a','b','c'])['d'].transform('first')
print (df)
a b c d
0 ca ca ca 120.0
1 ca cc cd 130.0
2 ca ca ca 120.0
3 ca ha ca 60.0
4 ca ha ca 60.0
Upvotes: 2
Reputation: 356
Considering this is your DataFrame:
import pandas as pd
df = pd.DataFrame([['ca', 'ca', 'ca', 120.00], ['ca', 'cc', 'cd', 130.00], ['ca', 'ca', 'ca', 135.23], ['ca', 'ha', 'ca', 60.00], ['ca', 'ha', 'ca', 50.00]], columns=['a', 'b', 'c', 'd'])
Let's construct a res_df, where only column A = Column b = Column C
res_df = df[(df['a'] == df['b']) & (df['a'] == df['c'])]
Let's get unique results from result set.
unique_vals = res_df['a'].unique()
Now let's loop these values. Step 1: getting the array of indices of the same type Step 2: get the value of column D of the first index that is get_indx[0] i.e. 0th position
df.loc[get_indx[0], 'd']
Step 3: set it to the original df from get_index[1:] i.e poition 1 till end
df.at[get_indx[i], 'd']
for value in unique_vals:
get_indx = res_df[res_df['a'] == value].index
if len(get_indx) > 1:
for i in range(1, len(get_indx)):
df.at[get_indx[i], 'd'] = df.loc[get_indx[0], 'd']
Upvotes: 0