Reputation: 51
I have a pandas dataframe with a column called 'picture'; that column has values that either start with a number or letter. What I'm trying to do is create a new column that checks whether or not the value starts with a letter or number, and populate that new column accordingly. I'm using np.where, and my code is below (raw_master is the dataframe, 'database' is the new column):
def iaps_or_naps(x):
if x in ["1","2","3","4","5","6","7","8","9"]:
return True
else:
return False
raw_master['database'] = np.where(iaps_or_naps(raw_master.picture[?][0])==True, 'IAPS', 'NAPS')
My issue is that if I just do raw_master.picture[0]
, that checks the value of the entire string, which is not what I need. I need the first character; however, if I do raw_master.picture[0][0]
, that will just evaluate to the first character of the first row for the whole dataframe. BTW, the question mark just means I'm not sure what to put there.
How can I get it so it takes the first character of the string for every row?
Thanks so much!
Upvotes: 3
Views: 1247
Reputation: 3281
You could use a mapping function such as apply
which iterates over each element in the column, this way accessing the first character with indexing [0]
df['new_col'] = df['picture'].apply(lambda x: 'IAPS' if x[0].str.isdigit() else 'NAPS')
Upvotes: 0
Reputation: 5359
You don't need to write your own function for this. Take this small df as an example:
s = pd.DataFrame(['3asd', 'asd', '3423', 'a123'])
looks like:
0
0 3asd
1 asd
2 3423
3 a123
using a pandas builtin:
# checking first column, s[0], first letter, str[0], to see if it is digit.
# if so, assigning IAPS, if not, assigning NAPS
s['database'] = np.where(s[0].str[0].str.isdigit(), 'IAPS', 'NAPS')
output:
0 database
0 3asd IAPS
1 asd NAPS
2 3423 IAPS
3 a123 NAPS
Applying this to your dataframe:
raw_master['database'] = np.where(raw_master['picture'].str[0].str.isdigit(), 'IAPS', 'NAPS')
Upvotes: 4
Reputation: 23099
IIUC you can just test if the first char is an int using pd.to_numeric
np.where(pd.to_numeric(df['your_col'].str[0],errors='coerce').isnull(),'IAPS'
,'NAPS') # ^ not a number
#^ number
Upvotes: 1