ap95btg
ap95btg

Reputation: 27

Check if the name of each column in dataframe contains a substring and change data type

I have a list of substrings and I want to check what columns in my data frame contains any of those substrings so I can change their data types.

For instance, if "Amount" is a substring in the list, then the columns with the names Cred_Amount, Amount_123, Amount should have their data types changed to int.

lst = ["Amount", "ID"]
df = pd.DataFrame({'ID_Loan': [2199, 3994, 8499, 74889, 83479],'Loan_Amount': ['100', '122', '222', '350', '300'],'Team': ['Boston', 'Boston', 'Boston', 'Chele', 'Barse'],'CreditAmount': [89, 79, 113, 78, 84],'College': ['MIT', 'MIT', 'MIT', 'Stanford', 'Stanford'],'ID_Num': [99999, 99994, 89999, 78889, 87779]})

df
   ID_Loan Loan_Amount    Team  CreditAmount   College  ID_Num
0     2199         100  Boston            89       MIT   99999
1     3994         122  Boston            79       MIT   99994
2     8499         222  Boston           113       MIT   89999
3    74889         350   Chele            78  Stanford   78889
4    83479         300   Barse            84  Stanford   87779

I expected that the 1st, 2nd, 4th, and 6th columns be changed to int data type

Upvotes: 0

Views: 184

Answers (1)

cs95
cs95

Reputation: 402663

Use filter to select and convert using astype:

u = df.filter(regex='|'.join(lst))
df[u.columns] = u.astype(int)

df.dtypes

ID_Loan          int64
Loan_Amount      int64
Team            object
CreditAmount     int64
College         object
ID_Num           int64
dtype: object

P.S, I've renamed your column list to lst because using list shadows the builtin which is bad practice.


Another option is using str.contains to get a mask and slice with loc:

m =  df.columns.str.contains('|'.join(lst))
df.loc[:, m] = df.loc[:,m].astype(int)

df.dtypes

ID_Loan          int64
Loan_Amount      int64
Team            object
CreditAmount     int64
College         object
ID_Num           int64
dtype: object

Finally, you can also use astype with a dictionary to selectively convert some columns, but return the whole DataFrame.

df = df.astype(
    dict.fromkeys(df.columns[df.columns.str.contains('|'.join(lst))], 'int'))

df.dtypes

ID_Loan          int64
Loan_Amount      int64
Team            object
CreditAmount     int64
College         object
ID_Num           int64
dtype: object

Upvotes: 1

Related Questions