Manipulate Data, based on multiple columns

Question

My data is like this:

Unique_Number       information       complete_information 
    1                 Hello              Hello World 
    1                 Hello                 
    1                 Wrong Info      
    2                 R                  R, Python 
    2                
    3                 OverFlow           Stackoverflow 
    4                 Only info

The thing that I want to achieve:

If a Unique_number is the same (e.g.: all the 1's, all the 2's etc.), it should take the values of the complete_information column, and paste it into the information column.

Desired output:

Unique_Number       information       complete_information 
    1                 Hello World          Hello World 
    1                 Hello World                
    1                 Hello World      
    2                 R, Python            R, Python 
    2                 R, Python
    3                 Stackoverflow        Stackoverflow 
    4                 Only info

I couldn't figure out a good logic for this. I tried to loop over all the Unique_Numbers, and paste the complete_information values, if an Unique_Numbers was the same.. but ran into a messy dataset.

anky · Accepted Answer

You can use :

df.information=np.where(df.complete_information.notna(),df.complete_information,
                                                           df.information)
df.information=df.groupby('Unique_Number')['information'].transform('first')
print(df)

   Unique_Number    information complete_information
0              1    Hello World          Hello World
1              1    Hello World                 None
2              1    Hello World                 None
3              2      R, Python            R, Python
4              2      R, Python                 None
5              3  Stackoverflow        Stackoverflow
6              4      Only info                 None

(if the cells are blank strings in complete_information , this may require them to be replaced by np.nan) , or replace df.complete_information.notna() in np.where(..) with df.complete_information.ne('')

Manipulate Data, based on multiple columns

Answers (2)

Related Questions