Reputation: 12407
I created a Pandas dataframe and would like to filter for some values. The data frame contains 4 columns namely currency port supplier_id value
and I would like to have values that would entertain that conditions provided below,
* port – expressed as a portcode, a 5-letter string uniquely identifying a port. Portcodes consist of 2-letter country code and 3-letter city code.
* supplier_id - integer, uniquely identifying the provider of the information
* currency - 3-letter string identifying the currency
* value - a floating-point number
df = df[ (len(df['port']) == 5 & isinstance(df['port'], basestring)) & \
isinstance(df['supplier_id'], int) & \
(len(df['currency']) == 3 & isinstance(df['currency'], basestring))\
isinstance(df['value'], float) ]
The snippet of the code should be obvious and trying to implement the conditions mentioned earlier but it doesn't work. The print from the df
is provided below,
currency port supplier_id value
0 CNY CNAQG 35.0 820.0
1 CNY CNAQG 19.0 835.0
2 CNY CNAQG 49.0 600.0
3 CNY CNAQG 54.0 775.0
4 CNY CNAQG 113.0 785.0
5 CNY CNAQG 5.0 790.0
6 CNY CNAQG 55.0 770.0
7 CNY CNAQG 81.0 810.0
8 CNY CNAQG 2.0 770.0
9 CNY CNAQG 10.0 825.0
print df[df.supplier_id.isnull()] # prints below
Empty DataFrame
Columns: [currency, port, supplier_id, value]
Index: []
df.info() # prints below
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6661 entries, 0 to 6660
Data columns (total 4 columns):
currency 6661 non-null object
port 6661 non-null object
supplier_id 6661 non-null float64
value 6661 non-null float64
dtypes: float64(2), object(2)
memory usage: 208.2+ KB
None
How to write it properly?
Upvotes: 1
Views: 2154
Reputation: 862611
You can use if have mixed values in one column - numeric with strings:
df = pd.DataFrame({'port':['aa789',2,3],
'supplier_id':[4,'s',6],
'currency':['USD',8,9],
'value':[1.7,3,5]})
print (df)
currency port supplier_id value
0 USD aa789 4 1.7
1 8 2 s 3.0
2 9 3 6 5.0
#for python 2 change str to basestring
m1 = (df.port.astype(str).str.len() == 5) & (df.port.apply(lambda x :isinstance(x, str)))
m2 = df.supplier_id.apply(lambda x : isinstance(x, int))
m3=(df.currency.astype(str).str.len() == 3)&(df.currency.apply(lambda x :isinstance(x, str)))
m4 = df.value.apply(lambda x : isinstance(x, float))
mask = m1 & m2 & m3 & m4
print (mask)
0 True
1 False
2 False
dtype: bool
print (df[mask])
currency port supplier_id value
0 USD aa789 4 1.7
Upvotes: 3