gfernandes
gfernandes

Reputation: 183

splitting an string into a column in python

I have a quick question, I have the below df

df

    File_Path
0   /data/application/AANX/aanx-dataeng-slas-sysyphus/scripts/s_shell/call_iws/call_PP_NEXT_RTBA_MAU_IND_INVE_D.sh
1   /data/application/AANX/aanx-dataeng-slas-sysyphus/scripts/s_shell/call_iws/call_PP_NEXT_RTBA_MAU_IND_EMPF_D.sh
2   /data/app_next_best_action/call_nba_as11.sh
3   /data/application/AAIN/aain-srv-motor-extracao-next/iws/call_run_extract_default.sh cdlc_ing
4   sh /data/processos/current/aplicacao/AAVR/ACN10/scr/exec_fim_grupo.sh ACN10_ARQ_1   

and I want to get the 4th item of the tree structure in the File_Path column.

the output should looks like this:

df

    File_Path                                                                                                       Parent_path
0   /data/application/AANX/aanx-dataeng-slas-sysyphus/scripts/s_shell/call_iws/call_PP_NEXT_RTBA_MAU_IND_INVE_D.sh  /data/application/AANX/aanx-dataeng-slas-sysyphus/
1   /data/application/AANX/aanx-dataeng-slas-sysyphus/scripts/s_shell/call_iws/call_PP_NEXT_RTBA_MAU_IND_EMPF_D.sh  /data/application/AANX/aanx-dataeng-slas-sysyphus/
2   /data/app_next_best_action/call_nba_as11.sh                                                                     /data/app_next_best_action/call_nba_as11.sh
3   /data/application/AAIN/aain-srv-motor-extracao-next/iws/call_run_extract_default.sh                             /data/application/AAIN/aain-srv-motor-extracao-next/
4   sh /data/processos/current/aplicacao/AAVR/ACN10/scr/exec_fim_grupo.sh ACN10_ARQ_1                               /data/processos/current/aplicacao/

In index = 2, there is no 4th item, so it gets the last, which is a file call_nba_as11.sh

Also in index=4 there is a "sh " in the begining of the file_path value, I need to escape that

could guys help me?

Upvotes: 1

Views: 82

Answers (1)

mozway
mozway

Reputation: 260335

You can use a regex with str.extract:

df['Parent_path'] = df['File_Path'].str.extract(r'^((?:/[^/]+){,4}/?)')

output:

                                                                                                        File_Path                                           Parent_path
0  /data/application/AANX/aanx-dataeng-slas-sysyphus/scripts/s_shell/call_iws/call_PP_NEXT_RTBA_MAU_IND_INVE_D.sh    /data/application/AANX/aanx-dataeng-slas-sysyphus/
1  /data/application/AANX/aanx-dataeng-slas-sysyphus/scripts/s_shell/call_iws/call_PP_NEXT_RTBA_MAU_IND_EMPF_D.sh    /data/application/AANX/aanx-dataeng-slas-sysyphus/
2                                                                     /data/app_next_best_action/call_nba_as11.sh           /data/app_next_best_action/call_nba_as11.sh
3                    /data/application/AAIN/aain-srv-motor-extracao-next/iws/call_run_extract_default.sh cdlc_ing  /data/application/AAIN/aain-srv-motor-extracao-next/

regex demo

Alternative:

df['Parent_path'] = df['File_Path'].str.extract(r'^[^/]*((?:/[^/]+){,4}/?)')

Output:

                                                                                                        File_Path                                           Parent_path
0  /data/application/AANX/aanx-dataeng-slas-sysyphus/scripts/s_shell/call_iws/call_PP_NEXT_RTBA_MAU_IND_INVE_D.sh    /data/application/AANX/aanx-dataeng-slas-sysyphus/
1  /data/application/AANX/aanx-dataeng-slas-sysyphus/scripts/s_shell/call_iws/call_PP_NEXT_RTBA_MAU_IND_EMPF_D.sh    /data/application/AANX/aanx-dataeng-slas-sysyphus/
2                                                                     /data/app_next_best_action/call_nba_as11.sh           /data/app_next_best_action/call_nba_as11.sh
3                    /data/application/AAIN/aain-srv-motor-extracao-next/iws/call_run_extract_default.sh cdlc_ing  /data/application/AAIN/aain-srv-motor-extracao-next/
4                               sh /data/processos/current/aplicacao/AAVR/ACN10/scr/exec_fim_grupo.sh ACN10_ARQ_1                    /data/processos/current/aplicacao/

Upvotes: 3

Related Questions