Reputation: 368
I want to return unique rows from multiple columns in a df
. The issue is I want to include the same set of values if they don't appear in the previous row. This is a little hard to explain so I'll display it;
df = pd.DataFrame({
'Time' : ['2019-08-02 09:50:10.1','2019-08-02 09:50:10.2','2019-08-02 09:50:10.3','2019-08-02 09:50:10.4','2019-08-02 09:50:10.5','2019-08-02 09:50:10.6','2019-08-02 09:50:10.7','2019-08-02 09:50:10.8','2019-08-02 09:50:10.9','2019-08-02 09:50:11.0'],
'Code1' : ['A','A','B','B','C','C','A','A','B','B'],
'Code2' : ['B','B','A','A','B','B','B','B','A','A'],
'Code3' : [np.nan,np.nan,'C','C','A','A','C','C','C','C'],
})
df = df[df.iloc[:, 1:].shift().ne(df.iloc[:, 1:]).any(axis=1)].reset_index(drop = True)
Intended Output:
Time Code1 Code2 Code3
0 2019-08-02 09:50:10.1 A B NaN
1 2019-08-02 09:50:10.3 B A C
2 2019-08-02 09:50:10.5 C B A
3 2019-08-02 09:50:10.7 A B C
4 2019-08-02 09:50:10.9 B A C
Upvotes: 0
Views: 52
Reputation: 109528
You can use a sentinel value for the nulls, then group on the result.
The groupby
is a variant of the shift-cumsum pattern.
sentinel = 99999999
mask = df[cols].fillna(sentinel).ne(df[cols].fillna(sentinel).shift())
>>> df.groupby(
mask.any(axis=1).cumsum(),
sort=False,
as_index=False
).first()
Time Code1 Code2 Code3
0 2019-08-02 09:50:10.1 A B NaN
1 2019-08-02 09:50:10.3 B A C
2 2019-08-02 09:50:10.5 C B A
3 2019-08-02 09:50:10.7 A B C
4 2019-08-02 09:50:10.9 B A C
Upvotes: 1
Reputation: 42886
First we use iloc
to select the correct columns, then we use shift
to check if current row is not equal to the next one. Finally we use any
over axis=1 (columns)
. Because A B C
and B A C
are different, but have C
in common:
df[df.iloc[:, 1:].shift().ne(df.iloc[:, 1:]).any(axis=1)]
Or the same, but written down little less concise:
mask = df.iloc[:, 1:].shift().ne(df.iloc[:, 1:])
df[mask.any(axis=1)]
Time Code1 Code2 Code3
0 2019-08-02 09:50:10.1 A B C
2 2019-08-02 09:50:10.3 B A C
4 2019-08-02 09:50:10.5 C B A
6 2019-08-02 09:50:10.7 A B C
8 2019-08-02 09:50:10.9 B A C
Upvotes: 1