Reputation: 2146
I have a CSV like below. And I want to duplicate row when c2,c3 both are some numbers. Like the last row
Initial input
C1,C2,C3
1,2,NaN
1,NaN,3
2,4,5 #both C2C3 not NaN change this row to 2 separate rows
Expected output
C1,C2,C3
1,2,NaN #nochange
1,NaN,3 #nochange
2,NaN,5 #split1
2,4,NaN #split2
This looks simple but I'm unable to find a way.
Upvotes: 0
Views: 1117
Reputation: 862641
You can use:
notnull
and boolean indexing
concat
df
with removed columns by drop
FloatIndex
for later sorting - new row is always after first oneconcat
rows with NaNs
- ~
is for inversing boolean masksort_index
and remove FloatIndex
by reset_index
print (df)
C1 C2 C3
0 1 2.0 NaN
1 4 7.0 8.0
2 1 NaN 3.0
3 2 4.0 5.0
mask = df['C2'].notnull() & df['C3'].notnull()
df1 = df[mask]
df1 = pd.concat([df1.drop('C2',1), df1.drop('C3',1)])
df1.index = df1.index.where(df1.index.duplicated(keep='last'), df1.index + .1)
print (df1)
C1 C2 C3
1.0 4 NaN 8.0
3.0 2 NaN 5.0
1.1 4 7.0 NaN
3.1 2 4.0 NaN
df2 = pd.concat([df[~mask], df1]).sort_index().reset_index(drop=True)
print (df2)
C1 C2 C3
0 1 2.0 NaN
1 4 NaN 8.0
2 4 7.0 NaN
3 1 NaN 3.0
4 2 NaN 5.0
5 2 4.0 NaN
Upvotes: 2