Reputation: 621
Lets say I have a dataframe like this:
full_path
0 C:\Users\User\Desktop\Test1\1.txt
1 C:\Users\User\Desktop\ABC\1.txt
2 C:\Users\User\Desktop\Test2\1.txt
3 C:\Users\User\Desktop\Test1\1.txt
4 C:\Users\User\Desktop\ABCD\1.txt
5 C:\Users\User\Desktop\Test2\1.txt
I want to check if the 5th element of the path is equal to Test 1 and Test2 and create a column like below:
full_path folder
0 C:\Users\User\Desktop\Test1\1.txt Test1
1 C:\Users\User\Desktop\ABC\1.txt
2 C:\Users\User\Desktop\Test2\1.txt Test2
3 C:\Users\User\Desktop\Test1\1.txt Test1
4 C:\Users\User\Desktop\ABCD\1.txt
5 C:\Users\User\Desktop\Test2\1.txt Test2
I tried this command df['folder']=df["full_path"].str.rsplit("\\").str[4]
but it gives me this output:
full_path folder
0 C:\Users\User\Desktop\Test1\1.txt Test1
1 C:\Users\User\Desktop\ABC\1.txt ABC
2 C:\Users\User\Desktop\Test2\1.txt Test2
3 C:\Users\User\Desktop\Test1\1.txt Test1
4 C:\Users\User\Desktop\ABCD\1.txt ABCD
5 C:\Users\User\Desktop\Test2\1.txt Test2
I dont want folders that are not Test1 and Test2 to be shown in the folder column
Upvotes: 0
Views: 31
Reputation: 1875
You can use Numpy where:
import numpy as np
df['folder'] = np.where(df['full_path'].str.contains('Test'),
df['full_path'].str.rsplit('\\').str[4],
np.nan
)
Output:
full_path folder
0 C:\Users\User\Desktop\Test1\1.txt Test1
1 C:\Users\User\Desktop\ABC\1.txt NaN
2 C:\Users\User\Desktop\Test2\1.txt Test2
3 C:\Users\User\Desktop\Test1\1.txt Test1
4 C:\Users\User\Desktop\ABCD\1.txt NaN
5 C:\Users\User\Desktop\Test2\1.txt Test2
Upvotes: 1