Eli Turasky
Eli Turasky

Reputation: 1061

Subtract one value from all dataframe entries

I have a large dataframe that contains temperature in Kelvin. I want to convert all of the temperature data to Celsius. I can't find any examples where a single subtraction is done.

Here is my dataframe:

                    Antwerp     Busan       Colombo     Dalian      Guangzhou   Hamburg     Hong Kong   Jebel      Ali/Dubai    Kaohsiung   Laem Chabang    ... Rotterdam   Shanghai    Shenzhen    Singapore   Tanjung Pelepas Tanjung Priok/Jakarta   Tianjin Xiamen  Yingkou 
time                                                                                    
1990-01-01 00:00:00 273.70395   279.31912   298.03195   268.42200   285.93228   271.31534   290.31357   289.83023   292.94135   298.48724   ... 274.18726   279.60450   288.37366   298.10950   298.23816   299.37143   272.06094   285.92570   265.19046   
1990-01-01 01:00:00 273.72702   279.94266   298.02042   268.18445   286.04940   271.18503   290.59730   289.69333   292.95950   298.01053   ... 274.12128   280.13235   288.59967   298.21176   298.40808   299.59576   272.04776   286.36612   265.10303   
1990-01-01 02:00:00 273.47134   280.65198   298.40310   269.00925   286.67624   271.22790   291.18784   289.33700   293.10632   301.11172   ... 273.94310   282.45330   289.25455   298.39322   298.64725   300.08075   272.84616   287.74683   265.73150 

I just want to subtract 273 from every city column, while excluding the time column.

Upvotes: 1

Views: 1047

Answers (2)

jezrael
jezrael

Reputation: 863301

Form sample data seems DatetimeIndex, so only subtract scalar value:

df = df.sub(273.15)

If time is column:

df = df.set_index('time').sub(273.15)

Or if first column is time column:

df.iloc[:, 1:] = df.iloc[:, 1:].sub(273.15)

Performance for 300k rows:

df = pd.concat([df] * 100000)
print (df)

In [170]: %timeit df.set_index('time').applymap(lambda value:value-273)
1.9 s ± 16.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [171]: %timeit df.set_index('time').sub(273.15)
95.6 ms ± 575 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Sample data:

df = pd.DataFrame({'time': [pd.Timestamp('1990-01-01 00:00:00'), pd.Timestamp('1990-01-01 01:00:00'), pd.Timestamp('1990-01-01 02:00:00')], 'Antwerp': [273.70395, 273.72702000000004, 273.47134], 'Busan': [279.31912, 279.94266, 280.65198], 'Colombo': [298.03195, 298.02042, 298.4031], 'Dalian': [268.422, 268.18445, 269.00925]})
print (df)
                 time    Antwerp      Busan    Colombo     Dalian
0 1990-01-01 00:00:00  273.70395  279.31912  298.03195  268.42200
1 1990-01-01 01:00:00  273.72702  279.94266  298.02042  268.18445
2 1990-01-01 02:00:00  273.47134  280.65198  298.40310  269.00925

Upvotes: 2

Mehul Gupta
Mehul Gupta

Reputation: 1939

if 'time' isn't index:

df = df.set_index('time').applymap(lambda value:value-273).reset_index()

else

df = df.applymap(lambda value: value-273)

applymap() applies any function on every value of the dataframe except the index.

Upvotes: 1

Related Questions