Arefe
Arefe

Reputation: 12407

How to filter column values in the pandas dataframe with certain conditions?

I created a Pandas dataframe and would like to filter for some values. The data frame contains 4 columns namely currency port supplier_id value and I would like to have values that would entertain that conditions provided below,

* port – expressed as a portcode, a 5-letter string uniquely identifying a port. Portcodes consist of 2-letter country code and 3-letter city code.
* supplier_id - integer, uniquely identifying the provider of the information
* currency - 3-letter string identifying the currency
* value - a floating-point number

df =  df[ (len(df['port']) == 5 & isinstance(df['port'], basestring)) & \
  isinstance(df['supplier_id'], int) & \
  (len(df['currency']) == 3 & isinstance(df['currency'], basestring))\
  isinstance(df['value'], float) ]

The snippet of the code should be obvious and trying to implement the conditions mentioned earlier but it doesn't work. The print from the df is provided below,

     currency   port  supplier_id   value
0         CNY  CNAQG         35.0   820.0
1         CNY  CNAQG         19.0   835.0
2         CNY  CNAQG         49.0   600.0
3         CNY  CNAQG         54.0   775.0
4         CNY  CNAQG        113.0   785.0
5         CNY  CNAQG          5.0   790.0
6         CNY  CNAQG         55.0   770.0
7         CNY  CNAQG         81.0   810.0
8         CNY  CNAQG          2.0   770.0
9         CNY  CNAQG         10.0   825.0


print df[df.supplier_id.isnull()] # prints below 
Empty DataFrame
Columns: [currency, port, supplier_id, value]
Index: []



df.info() # prints below     
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6661 entries, 0 to 6660
Data columns (total 4 columns):
currency       6661 non-null object
port           6661 non-null object
supplier_id    6661 non-null float64
value          6661 non-null float64
dtypes: float64(2), object(2)
memory usage: 208.2+ KB
None

How to write it properly?

Upvotes: 1

Views: 2154

Answers (1)

jezrael
jezrael

Reputation: 862611

You can use if have mixed values in one column - numeric with strings:

df = pd.DataFrame({'port':['aa789',2,3],
                   'supplier_id':[4,'s',6],
                   'currency':['USD',8,9],
                   'value':[1.7,3,5]})

print (df)
  currency   port supplier_id  value
0      USD  aa789           4    1.7
1        8      2           s    3.0
2        9      3           6    5.0

#for python 2 change str to basestring
m1 = (df.port.astype(str).str.len() == 5) & (df.port.apply(lambda x :isinstance(x, str)))
m2 = df.supplier_id.apply(lambda x : isinstance(x, int))
m3=(df.currency.astype(str).str.len() == 3)&(df.currency.apply(lambda x :isinstance(x, str)))
m4 = df.value.apply(lambda x : isinstance(x, float))
mask = m1 & m2 & m3 & m4
print (mask)
0     True
1    False
2    False
dtype: bool

print (df[mask])
  currency   port supplier_id  value
0      USD  aa789           4    1.7

Upvotes: 3

Related Questions