user10337478
user10337478

Reputation:

Add an empty row if index of the column isn't there in pandas

I have two dataframes df and df1. so i want to take the difference df and df1 but the number of rows are not same. Like row number 5,6,7 of df1 are not there in df.

df

        wave  num  stlines     fwhm       EWs  MeasredWave  
0    4050.32    3  0.28269  0.07365  22.16080  4050.311360   
1    4208.98    5  0.48122  0.08765  44.90035  4208.972962   
2    4374.94    9  0.71483  0.11429  86.96497  4374.927110   
3    4379.74    9  0.31404  0.09107  30.44271  4379.760601   
4    4398.01   14  0.50415  0.09845  52.83236  4398.007473 
5    5520.50    1  0.06148  0.12556   8.21685  5520.484742   
6    6052.67    1  0.04802  0.24632  12.59131  6052.650064 

df1

        wave  num  stlines     fwhm       EWs  MeasredWave  
0    4050.32    3  0.28616  0.07521  22.91064  4050.327388   
1    4208.98    6  0.48781  0.08573  44.51609  4208.990029   
2    4374.94    9  0.71548  0.11437  87.10152  4374.944513   
3    4379.74   10  0.31338  0.09098  30.34791  4379.778009   
4    4398.01   15  0.49950  0.08612  45.78707  4398.020367   
5    4502.21    9  0.56362  0.10114  60.67868  4502.223123   
6    4508.28    3  0.69554  0.11600  85.88428  4508.291777   
7    4512.99    2  0.20486  0.08891  19.38745  4512.999332
8    5520.50    1  0.06148  0.12556   8.21685  5520.484742   
9    6052.67    1  0.04802  0.24632  12.59131  6052.650064 

path_to_files = '/home/Desktop/computed_2d_blaze/'
lst = []
for filen in [x for x in os.listdir(path_to_files) if '.ares' in x]:
   df = pd.read_table(path_to_files+filen, skiprows=0, usecols=(0,1,2,3,4,8),names=['wave','num','stlines','fwhm','EWs','MeasredWave'],delimiter=r'\s+')
   df = df.drop_duplicates('wave')
   df = df.sort_values('wave', ascending=True)
   df = df.reset_index(drop=True)
   df[filen] = None
   lst.append(df)




path_to_files1 = '/home/Desktop/computed_1d/'
lst1 = []
for filen in [x for x in os.listdir(path_to_files1) if '.ares' in x]:
   df1 = pd.read_table(path_to_files1+filen, skiprows=0, usecols=(0,1,2,3,4,8),names=['wave','num','stlines','fwhm','EWs','MeasredWave'],delimiter=r'\s+')
   df1 = df1.sort_values('wave', ascending=True)
   #df1 = df1.drop_duplicates('wave')
   df1 = df1.reset_index(drop=True)
   lst1.append(df1)

So what i want to do is like, i want to subtract the dataframes on the basis of cloumn wave. So i want to make sure that each index of df.wave gets subtracted from the correct index of df1. If the index of wave(row) is not there in dataframe then i want to add an empty row to make sure it doesn't messes up the rest of the data.

The desired output of df should look like this:

        wave  num  stlines     fwhm       EWs  MeasredWave  
0    4050.32    3  0.28269  0.07365  22.16080  4050.311360   
1    4208.98    5  0.48122  0.08765  44.90035  4208.972962   
2    4374.94    9  0.71483  0.11429  86.96497  4374.927110   
3    4379.74    9  0.31404  0.09107  30.44271  4379.760601   
4    4398.01   14  0.50415  0.09845  52.83236  4398.007473 
5       0.00    0  0.00     0.00      0.00        0.00
6       0.00    0  0.00     0.00      0.00        0.00
7       0.00    0  0.00     0.00      0.00        0.00
8    5520.50    1  0.06148  0.12556   8.21685  5520.484742   
9    6052.67    1  0.04802  0.24632  12.59131  6052.650064 

Upvotes: 0

Views: 770

Answers (1)

Vishnu Kunchur
Vishnu Kunchur

Reputation: 1726

Given:

df

      wave num  stlines     fwhm      EWs MeasredWave
0  4050.32   3  0.28269  0.07365  22.1608      4050.31
1  4208.98   5  0.48122  0.08765  44.9004      4208.97
2  4374.94   9  0.71483  0.11429   86.965      4374.93
3  4379.74   9  0.31404  0.09107  30.4427      4379.76
4  4398.01  14  0.50415  0.09845  52.8324      4398.01
5   5520.5   1  0.06148  0.12556  8.21685      5520.48
6  6052.67   1  0.04802  0.24632  12.5913      6052.65

df1:

      wave num  stlines     fwhm      EWs MeasredWave
0  4050.32   3  0.28269  0.07365  22.1608      4050.31
1  4208.98   5  0.48122  0.08765  44.9004      4208.97
2  4374.94   9  0.71483  0.11429   86.965      4374.93
3  4379.74   9  0.31404  0.09107  30.4427      4379.76
4  4398.01  14  0.50415  0.09845  52.8324      4398.01
5  4502.21   9  0.56362  0.10114  60.6787      4502.22
6  4508.28   3  0.69554    0.116  85.8843      4508.29
7  4512.99   2  0.20486  0.08891  19.3875         4513
8   5520.5   1  0.06148  0.12556  8.21685      5520.48
9  6052.67   1  0.04802  0.24632  12.5913      6052.65

Try this:

df_new = pd.merge(df1, df, on = 'wave', how = 'left')[['wave', 'num_y', 'stlines_y', 'fwhm_y', 'EWs_y', 'MeasredWave_y']]
df_new.fillna(0, inplace = True)
df_new['wave'] = df_new['wave']*df1['wave'].isin(df['wave']).astype(int)

Output:

df_new

      wave  num_y  stlines_y   fwhm_y     EWs_y  MeasredWave_y
0  4050.32      3    0.28269  0.07365  22.16080     4050.311360
1  4208.98      5    0.48122  0.08765  44.90035     4208.972962
2  4374.94      9    0.71483  0.11429  86.96497     4374.927110
3  4379.74      9    0.31404  0.09107  30.44271     4379.760601
4  4398.01     14    0.50415  0.09845  52.83236     4398.007473
5     0.00      0    0.00000  0.00000   0.00000        0.000000
6     0.00      0    0.00000  0.00000   0.00000        0.000000
7     0.00      0    0.00000  0.00000   0.00000        0.000000
8  5520.50      1    0.06148  0.12556   8.21685     5520.484742
9  6052.67      1    0.04802  0.24632  12.59131     6052.650064

Upvotes: 1

Related Questions