Reputation: 1352
My data is like this:
Unique_Number information complete_information
1 Hello Hello World
1 Hello
1 Wrong Info
2 R R, Python
2
3 OverFlow Stackoverflow
4 Only info
The thing that I want to achieve:
If a Unique_number is the same (e.g.: all the 1's, all the 2's etc.), it should take the values of the complete_information column, and paste it into the information column.
Desired output:
Unique_Number information complete_information
1 Hello World Hello World
1 Hello World
1 Hello World
2 R, Python R, Python
2 R, Python
3 Stackoverflow Stackoverflow
4 Only info
I couldn't figure out a good logic for this. I tried to loop over all the Unique_Numbers, and paste the complete_information values, if an Unique_Numbers was the same.. but ran into a messy dataset.
Upvotes: 1
Views: 60
Reputation: 92854
With mask based ob shifted values (pandas.Series.shift):
In [723]: m = (df['Unique_Number'].shift(-1) == df['Unique_Number']) | (df['Unique_Number'] == df['Unique_Number'].shift(1))
In [724]: df.loc[m, 'information'] = df.loc[m, 'complete_information'].fillna(method='ffill')
In [725]: df
Out[725]:
Unique_Number information complete_information
0 1 Hello World Hello World
1 1 Hello World None
2 1 Hello World None
3 2 R, Python R, Python
4 2 R, Python None
5 3 OverFlow Stackoverflow
6 4 Only info None
Upvotes: 1
Reputation: 75080
You can use :
df.information=np.where(df.complete_information.notna(),df.complete_information,
df.information)
df.information=df.groupby('Unique_Number')['information'].transform('first')
print(df)
Unique_Number information complete_information
0 1 Hello World Hello World
1 1 Hello World None
2 1 Hello World None
3 2 R, Python R, Python
4 2 R, Python None
5 3 Stackoverflow Stackoverflow
6 4 Only info None
(if the cells are blank strings in complete_information
, this may require them to be replaced by np.nan
) , or replace df.complete_information.notna()
in np.where(..)
with df.complete_information.ne('')
Upvotes: 1