akkab
akkab

Reputation: 401

If/else statement within loop over dataframe

I have a dataframe with three columns: Depth, Shale Volume and Density.

What I need to do is to calculate porosity based on the shale volume and density. So, where the shale volume is >0.7 I apply certain parameters for the porosity calculation and where i have the volume < 0.2 I have other parameters.

For example if the Shale volume is < 0.2:

 porosity=density*2.3

and if shale volume is >0.7:

 porosity=density*1.7

this is the example of the part of the dataframe if have:

 depth       density    VSH
 5517        2.126      0.8347083
 5517.5      2.123      0.8310949
 5518        2.124      0.8012414
 5518.5      2.121      0.7838615
 5519        2.116      0.7674243
 5519.5      2.127      0.8405414

this is the piece of code I am trying to do. I want it to be in for loop because it will serve for the future purposes:

 for index, row in data.iterrows():
     if data.loc[index, 'VSH']<0.2:
          data.loc[index,'porosity']=(data['density']*2.3)
     elif data.loc[index, 'VSH'] > 0.7:
          data.loc[index,'porosity']=(data['density']*1.7)

The error I am getting is the following, it would be great if you can provide me with help:

 TypeError: '<' not supported between instances of 'str' and 'float'

Upvotes: 2

Views: 71

Answers (1)

jezrael
jezrael

Reputation: 862581

Here iterrows is bad choice, because slow and exist vectorized solution, check Does pandas iterrows have performance issues?

So use numpy.select:

m1 = data['VSH'] < 0.2
m2 = data['VSH'] > 0.7
s1 = data['density']*2.3
s2 = data['density']*1.7

data['porosity'] = np.select([m1, m2], [s1, s2])

print (data)
    depth  density       VSH  porosity
0  5517.0    2.126  0.834708    3.6142
1  5517.5    2.123  0.831095    3.6091
2  5518.0    2.124  0.801241    3.6108
3  5518.5    2.121  0.783861    3.6057
4  5519.0    2.116  0.767424    3.5972
5  5519.5    2.127  0.840541    3.6159

Better is also defined, whats happen between 0.2 and 0.7 - e.g. returned value of column data['density'] in default parameter:

data['porosity'] = np.select([m1, m2], [s1, s2], default=data['density'])

print (data)
    depth  density       VSH  porosity
0  5517.0    2.126  0.834708    3.6142
1  5517.5    2.123  0.831095    3.6091
2  5518.0    2.124  0.801241    3.6108
3  5518.5    2.121  0.783861    3.6057
4  5519.0    2.116  0.767424    3.5972
5  5519.5    2.127  0.840541    3.6159

Upvotes: 2

Related Questions