Reputation: 113
I have a csv with 4 columns. The file contains some missing rows based on the series.
Input:-
No A B C
1 10 50 12
3 40 50 12
4 20 60 15
6 80 80 18
Output:-
No A B C
1 10 50 12
2 10 50 12
3 40 50 12
4 20 60 15
5 20 60 15
6 80 80 18
I need python and pandas code to generate the above output.
Upvotes: 1
Views: 115
Reputation: 10960
Create a dataframe of your missing rows
missing_list = [[i] + [pd.np.nan]*(df.shape[1] - 1) for i in range(df.No.min(), df.No.max()) if i not in df.No]
missing_df = pd.DataFrame(missing_list, columns=df.columns)
Concat to original dataframe, sort and forward fill
pd.concat([df, missing_df]).sort_values('No').ffill()
Upvotes: 0
Reputation: 862481
Use if No
is column - create index by No
and DataFrame.reindex
by range
with all possible values:
v = range(df['No'].min(), df['No'].max() + 1)
df1 = df.set_index('No').reindex(v, method='ffill').reset_index()
print (df1)
No A B C
0 1 10 50 12
1 2 10 50 12
2 3 40 50 12
3 4 20 60 15
4 5 20 60 15
5 6 80 80 18
Use if No
is index solution is changed a bit:
v = range(df.index.min(), df.index.max() + 1)
df1 = df.reindex(v, method='ffill')
print (df1)
A B C
No
1 10 50 12
2 10 50 12
3 40 50 12
4 20 60 15
5 20 60 15
6 80 80 18
Upvotes: 1