Reputation: 51
I am fairly new to Python and currently I am trying to build a function that searches for the first 2 digits of the elements in a column and if true, return the result with a new header such as region
For example,
Adres AreaCode Region
0 SArea 123191 A
1 BArea 122929 A
2 AArea 132222 B
I want the function to search for just the first 2 digits of the AreaCode which would give me the result of along with a new header of Region which classifies the Region based on the first 2 digits of the AreaCode. So in this case 12 would give me A and 13 would give me B
I already tried this
df.loc[df.AreaCode == 123191, 'Region'] = 'A'
and this worked for the entire AreaCode but I have no idea how to modify it so that I would be able to search based on the first 2 digits.
and I tried this
df.loc[df.AreaCode.str.contains == 12, 'Region' ] = 'A'
but it gives me the error:
AttributeError: Can only use .str accessor with string values,
which use np.object_ dtype in pandas
How do I fix this and thanks a lot for helping!
Upvotes: 3
Views: 138
Reputation: 11907
First convert the data type to str
like this
df.AreaCode = df.AreaCode.astype('str')
Then check for the number in beginning like this
df.loc[df.AreaCode.startswith('12'), 'Region' ] = 'A'
Assuming you need nan in the rows which dont start with A, you can do a map like this
df['Region'] = df['AreaCode'].map(lambda x : 'A' if x.startswith('12') else np.nan )
Upvotes: 0
Reputation: 94
See if this helps -
First convert Area code column dtype to string with
df.AreaCode = df.AreaCode.astype(str)
And then do filtering with first characters of the column
df.loc[(df.AreaCode.str.startswith('12')) & (df.Region=='A')]
Upvotes: 2
Reputation: 376
Try this
df.loc[df.AreaCode.astype(str).str.startswith("12") == True, 'Region' ]
The line below will give you a series with True/False for each row and what becomes the filter for the dataframe.
df.AreaCode.astype(str).str.startswith("12")
Assigning a equals test makes it a filter.
Upvotes: 2
Reputation: 11232
I tried this df.loc[df.AreaCode.str.contains == 12, 'Region' ] = 'A' but it gives me the error: AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas
You could simply convert it to a string, then use the same code:
df.loc[df.AreaCode.astype(str).str.startswith('12'), 'Region' ] = 'A'
Upvotes: 2
Reputation: 156
This will work I guess.
df.loc[df.AreaCode.str.startswith('12'), 'Region' ] = 'A'
Upvotes: 1