Brian Winston
Brian Winston

Reputation: 51

Looking at the first character of a string for every element in a list

I have a pandas dataframe with a column called 'picture'; that column has values that either start with a number or letter. What I'm trying to do is create a new column that checks whether or not the value starts with a letter or number, and populate that new column accordingly. I'm using np.where, and my code is below (raw_master is the dataframe, 'database' is the new column):

def iaps_or_naps(x):
    if x in ["1","2","3","4","5","6","7","8","9"]:
        return True
    else:
        return False
raw_master['database'] = np.where(iaps_or_naps(raw_master.picture[?][0])==True, 'IAPS', 'NAPS')

My issue is that if I just do raw_master.picture[0], that checks the value of the entire string, which is not what I need. I need the first character; however, if I do raw_master.picture[0][0], that will just evaluate to the first character of the first row for the whole dataframe. BTW, the question mark just means I'm not sure what to put there.

How can I get it so it takes the first character of the string for every row?

Thanks so much!

Upvotes: 3

Views: 1247

Answers (3)

alex067
alex067

Reputation: 3281

You could use a mapping function such as apply which iterates over each element in the column, this way accessing the first character with indexing [0]

df['new_col'] = df['picture'].apply(lambda x: 'IAPS' if x[0].str.isdigit() else 'NAPS')

Upvotes: 0

d_kennetz
d_kennetz

Reputation: 5359

You don't need to write your own function for this. Take this small df as an example:

 s = pd.DataFrame(['3asd', 'asd', '3423', 'a123'])

looks like:

      0
0  3asd
1   asd
2  3423
3  a123

using a pandas builtin:

# checking first column, s[0], first letter, str[0], to see if it is digit.
# if so, assigning IAPS, if not, assigning NAPS
 s['database'] = np.where(s[0].str[0].str.isdigit(), 'IAPS', 'NAPS')

output:

      0 database
0  3asd     IAPS
1   asd     NAPS
2  3423     IAPS
3  a123     NAPS

Applying this to your dataframe:

raw_master['database'] = np.where(raw_master['picture'].str[0].str.isdigit(), 'IAPS', 'NAPS')

Upvotes: 4

Umar.H
Umar.H

Reputation: 23099

IIUC you can just test if the first char is an int using pd.to_numeric

np.where(pd.to_numeric(df['your_col'].str[0],errors='coerce').isnull(),'IAPS'   
,'NAPS')                                                             #  ^ not a number
  #^ number

Upvotes: 1

Related Questions