sundar_ima
sundar_ima

Reputation: 3890

Convert specific string to a numeric value in pandas

I am trying to do data analysis of some rainfall data. Example of the data looks like this:-

10   18/05/2016   26.9     40    20.8     34   52.2    20.8   46.5     45   
11   19/05/2016   25.5     32     0.3   41.6     42     0.3   56.3   65.2   
12   20/05/2016    8.5     29    18.4      9     36    18.4   28.6     46   
13   21/05/2016   24.5     18   TRACE    3.5     17   TRACE    4.4     40   
14   22/05/2016    0.6     18       0    6.5     14       0    8.6     20   
15   23/05/2016    3.5      9     0.6    4.3     14     0.6      7     15   
16   24/05/2016    3.6     25       T      3     12       T   14.9      9   
17   25/05/2016     25     21     2.2   25.6     50     2.2     25      9   

The rainfall data contain a specific string 'TRACE' or 'T' (both meaning non measurable rainfall amount). For analysis, I would like to convert this strings in to '1.0' (float). My desired data should look like this so as to plot the values as line diagram:-

10   18/05/2016   26.9     40    20.8     34   52.2    20.8   46.5     45   
11   19/05/2016   25.5     32     0.3   41.6     42     0.3   56.3   65.2   
12   20/05/2016    8.5     29    18.4      9     36    18.4   28.6     46   
13   21/05/2016   24.5     18     1.0    3.5     17     1.0    4.4     40   
14   22/05/2016    0.6     18       0    6.5     14       0    8.6     20   
15   23/05/2016    3.5      9     0.6    4.3     14     0.6      7     15   
16   24/05/2016    3.6     25     1.0      3     12     1.0   14.9      9   
17   25/05/2016     25     21     2.2   25.6     50     2.2     25      9   

Can some one point me to right direction?

Upvotes: 1

Views: 316

Answers (3)

cs95
cs95

Reputation: 402263

You can use df.replace, and then converting the numeric to float using df.astype (the original datatype would be object, so any operations on these columns would still suffer from performance issues):

df = df.replace('^T(RACE)?$', 1.0, regex=True)
df.iloc[:, 1:] = df.iloc[:, 1:].astype(float) # converting object columns to floats 

This will replace all T or TRACE elements with 1.0.

Output:

10  18/05/2016  26.9  40  20.8  34.0  52.2  20.8  46.5  45.0
11  19/05/2016  25.5  32   0.3  41.6  42.0   0.3  56.3  65.2
12  20/05/2016   8.5  29  18.4   9.0  36.0  18.4  28.6  46.0
13  21/05/2016  24.5  18     1   3.5  17.0     1   4.4  40.0
14  22/05/2016   0.6  18     0   6.5  14.0     0   8.6  20.0
15  23/05/2016   3.5   9   0.6   4.3  14.0   0.6   7.0  15.0
16  24/05/2016   3.6  25     1   3.0  12.0     1  14.9   9.0
17  25/05/2016  25.0  21   2.2  25.6  50.0   2.2  25.0   9.0

Upvotes: 2

Alexander
Alexander

Reputation: 109520

Extending the answer from @jezrael, you can replace and convert to floats in a single statement (assumes the first column is Date and the remaining are the desired numeric columns):

df.iloc[:, 1:] = df.iloc[:, 1:].replace({'T':1.0, 'TRACE':1.0}).astype(float)

Upvotes: 0

jezrael
jezrael

Reputation: 862441

Use replace by dict:

df = df.replace({'T':1.0, 'TRACE':1.0})

And then if necessary convert columns to float:

cols = df.columns.difference(['Date','another cols dont need convert'])
df[cols] = df[cols].astype(float)

df = df.replace({'T':1.0, 'TRACE':1.0})
cols = df.columns.difference(['Date','a'])
df[cols] = df[cols].astype(float)
print (df)
    a        Date     2     3     4     5     6     7     8     9
0  10  18/05/2016  26.9  40.0  20.8  34.0  52.2  20.8  46.5  45.0
1  11  19/05/2016  25.5  32.0   0.3  41.6  42.0   0.3  56.3  65.2
2  12  20/05/2016   8.5  29.0  18.4   9.0  36.0  18.4  28.6  46.0
3  13  21/05/2016  24.5  18.0   1.0   3.5  17.0   1.0   4.4  40.0
4  14  22/05/2016   0.6  18.0   0.0   6.5  14.0   0.0   8.6  20.0
5  15  23/05/2016   3.5   9.0   0.6   4.3  14.0   0.6   7.0  15.0
6  16  24/05/2016   3.6  25.0   1.0   3.0  12.0   1.0  14.9   9.0
7  17  25/05/2016  25.0  21.0   2.2  25.6  50.0   2.2  25.0   9.0

print (df.dtypes)
a         int64
Date     object
2       float64
3       float64
4       float64
5       float64
6       float64
7       float64
8       float64
9       float64
dtype: object

Upvotes: 2

Related Questions