Reputation: 653
I have the following dataframe:
from pandas import *
from math import *
data=read_csv('agosto.csv')
Fecha DirViento MagViento
0 2011/07/01 00:00 N 6.6
1 2011/07/01 00:15 N 5.5
2 2011/07/01 00:30 N 6.6
3 2011/07/01 00:45 N 7.5
4 2011/07/01 01:00 --- 6.0
5 2011/07/01 01:15 --- 7.1
6 2011/07/01 01:30 S 4.7
7 2011/07/01 01:45 SE 3.1
.
.
.
The first thing i want to do, is to convert wind values to numerical values in order to obtain the u and v wind components. But when I perform the operations, the missing data (---) generates conflicts.
direccion=[]
for i in data['DirViento']:
if i=='SSW':
dir=202.5
if i=='S':
dir=180.0
if i=='N':
dir=360.0
if i=='NNE':
dir=22.5
if i=='NE':
dir=45.0
if i=='ENE':
dir=67.5
if i=='E':
dir=90.0
if i=='ESE':
dir=112.5
if i=='SE':
dir=135.0
if i=='SSE':
dir=157.5
if i=='SW':
dir=225.0
if i=='WSW':
dir=247.5
if i=='W':
dir=270.0
if i=='WNW':
dir=292.5
if i=='NW':
dir=315.0
if i=='NNW':
dir=337.5
direccion.append(dir)
data['DirViento']=direccion
i get the following:
data['DirViento'].head()
0 67.5
1 67.5
2 67.5
3 67.5
4 67.5
because missing data is assigned the value of the other rows? The components of get with the following code
Vviento=[]
Uviento=[]
for i in range(0,len(data['MagViento'])):
Uviento.append((data['MagViento'][i]*sin((data['DirViento'][i]+180)*(pi/180.0))))
Vviento.append((data['MagViento'][i]*cos((data['DirViento'][i]+180)*(pi/180.0))))
data['PromeU']=Uviento
data['PromeV']=Vviento
Now grouped to obtain statistical data
index=data.set_index(['Fecha','Hora'],inplace=True)
g = index.groupby(level=0)
but i get error
IndexError: index out of range for array
Am I doing something wrong? How to perform operations without taking into account missing data?
Upvotes: 2
Views: 165
Reputation: 117345
I see one flow in your code. You conditional statement should be more like:
if i == 'SSW':
dir = 202.5
elif i == 'S':
...
else:
dir = np.nan
Or you can clean dir
variable in the beginning of the loop. Otherwise dir
for row with missing data will be the same as dir
for previous iteration.
But I think this code could be improved in more pythonic way, for example, something like this.
# test DataFrame
df = pd.DataFrame({'DirViento':['N', 'N', 'N', 'N', '--', '--', 'S', 'SE'])
DirViento
0 N
1 N
2 N
3 N
4 --
5 --
6 S
7 SE
# create points of compass list
dir_lst = ['NNE','NE','ENE','E','ESE','SE','SSE','S','SSW','WSW','W','WNW','NW','NNW','N']
# create dictionary from it
dir_dict = {x: (i + 1) *22.5 for i, x in enumerate(dir_lst)}
# add a new column
df['DirViento2'] = df['DirViento'].apply(lambda x: dir_dict.get(x, None))
DirViento DirViento2
0 N 360
1 N 360
2 N 360
3 N 360
4 -- NaN
5 -- NaN
6 S 180
7 SE 135
update Good suggestion from @DanAllan in comments, the code becomes even shorter and even more pythonic:
df['DirViento2'] = df['DirViento'].replace(dir_dict)
Upvotes: 1