Reputation: 2117
I have a dataframe with file extensions. Some have periods in them I am trying to create a new column flagging which ones contain a period or not conditionally. If I wanted to just get the rows that contain a period I would just use: send_rec_file_url[send_rec_file_url['file_name'].str.contains('\.')]
.
How do I create a new column like below?
df
file_name
0 png
1 jpg
2 jpg
3 pdf
4 pdf
5 xlsx
6 docx.pdf
7 txt.scf
8 pdf
9 TXT.vbs
10 read_this.pdf
Desired output:
df
file_name has_period
0 png no
1 jpg no
2 jpg no
3 pdf no
4 pdf no
5 xlsx no
6 docx.pdf yes
7 txt.scf yes
8 pdf no
9 TXT.vbs yes
10 read_this.pdf yes
Upvotes: 0
Views: 117
Reputation: 153460
You can try:
df['has_period'] = ["Yes" if '.' in i else "No" for i in df['file_name']]
Output:
file_name has_period
0 png No
1 jpg No
2 jpg No
3 pdf No
4 pdf No
5 xlsx No
6 docx.pdf Yes
7 txt.scf Yes
8 pdf No
9 TXT.vbs Yes
10 read_this.pdf Yes
Note: pandas .str accessor is pretty slow, this solution should outperform .str accessor solutions.
Upvotes: 2
Reputation: 8631
You need to use the mask to change the value of the column.
df['has_period'] = 'no'
df.loc[df['file_name'].str.contains('\.'), 'has_period'] = 'yes'
Output:
file_name has_period
0 png no
1 jpg no
2 jpg no
3 pdf no
4 pdf no
5 xlsx no
6 docx.pdf yes
7 txt.scf yes
8 pdf no
9 TXT.vbs yes
10 read_this.pdf yes
Upvotes: 3