AskingAndAnswering
AskingAndAnswering

Reputation: 45

Python Pandas Regex

I have a pandas dataframe like in the example below. Column 0 has many HTML tags, from which I need to extract all URLs and add them as columns in this DataFrame, while respecting the row order.

In this case, column 2, row 0 would have the: "https://sco...". In reality, this column could have as many as 10 URLs, which should be added to individual columns of the Dataframe. I've tried using Beautiful Soup, but I couldn't make it work accurately with a Dataframe like this.

I've tried extracting using the Regex below to extract all those URLs, but I couldn't plug it to the Dataframe.

postsOnlyURL = re.findall('"(http.*?)"',all_text,re.IGNORECASE|re.DOTALL)


                                                    0                                                  1
0   src="https://sco ...                               publicado a 23/10/2019Ident...
1   Ativo</div></div><div class="_7jwu">Começou a ...  AtivoComeçou a ser publicado a 23/10/2019Ident...
2   Ativo</div></div><div class="_7jwu">Começou a ...  AtivoComeçou a ser publicado a 23/10/2019Ident...

Is there a way to make this work?

Upvotes: 1

Views: 153

Answers (2)

ManojK
ManojK

Reputation: 1640

I can't access your dataset, but in general this is a way to extract urls from strings in dataframe with regex and create new columns dynamically according to the number of urls extracted:

df = pd.DataFrame({'Col1': ['check my blog http://example.com/blah or this is an example of https://google.com or http://facebook.com',
                             'get url from https://facebook.com', 'You can find answers at https://stackoverflow.com/']})

pattern = r'(https?://[^\s]+)'

df['urls'] = df['Col1'].str.findall(pattern)
df['urls'] = [','.join(map(str, l)) for l in df['urls']]
df = pd.concat([df, df['urls'].str.split(',', expand=True)], axis=1)

Upvotes: 3

Ali Cirik
Ali Cirik

Reputation: 1572

Here is a potential solution

import pandas as pd
import re
from bs4 import BeautifulSoup

# Create sample df
a = ["""Ativo</div></div><div class="_7jwu">Começou a ser publicado a <span>23/10/2019</span></div><div class="_8jox"><div aria-describedby="js_m" aria-haspopup="true" class="_4rhp" role="tooltip" tabindex="0">Identificação: 411753089755204</div></div></div><div class="_8k-_"><div class="_3qn7 _61-0 _2fyi _3qng" style="max-width: 120px;"><span data-hover="tooltip"><i class="_3-8_ img sp_-Fn2d835eMD sx_39e484" alt=""></i></span><span data-hover="tooltip"><i class="_3-8_ img sp_-Fn2d835eMD sx_f3b669" alt=""></i></span><span data-hover="tooltip"><i class="img sp_-Fn2d835eMD sx_e31062" alt=""></i></span></div></div></div><div class="_7jwv"><div style="display: inline-block; width: auto;"><button aria-pressed="false" data-testid="SUIAbstractMenu/button" type="button" aria-disabled="false" class="_271k _271l _1o4e _271m _1qjd _7tvm _7tv2 _7tv4" style="width: auto; letter-spacing: normal; color: rgb(68, 73, 80); font-size: 12px; font-weight: bold; font-family: Arial, sans-serif; line-height: 26px; text-align: center; background-color: transparent; border-color: transparent; height: 28px; padding-left: 7px; padding-right: 7px; border-radius: 2px;"><div class="_43rl"><i aria-hidden="true" class="_271o img sp_6UxJZoFesmZ sx_e4448e" alt=""></i><span class="accessible_elem">Abrir menu pendente</span></div></button></div></div></div><div class="_7jwy"><div class="_7jyg _7jyh"><div class="_7k71"><div class="_8nsi _8nqp"><div class="_3qn7 _61-0 _2fyi _3qng" style="width: 100%;"><img alt="imaginBank" class="_8nqq img" src="https://scontent.flis8-1.fna.fbcdn.net/v/t1.6435-9/56757490_843111089374606_3751796641934344192_n.png?_nc_cat=105&amp;_nc_oc=AQn_sfVuUVpGuXh9Xew56gOSFzdktA5s1xfEWBMkYzLNQ6m8zdOZve6xFIzu7IOEJL0&amp;_nc_ht=scontent.flis8-1.fna&amp;oh=ce8b3a3feea1162bc0874c260ab2b308&amp;oe=5EC4B78C"><div class="_3qn7 _61-0 _2fyh _3qnf" style="width: 100%;"><div class="_8nqr _3qn7 _61-3 _2fyi _3qng"><span style="font-family: Arial, sans-serif; font-size: 12px; line-height: 16px; letter-spacing: normal; font-weight: bold; overflow-wrap: normal; text-align: left; color: rgb(28, 30, 33);"><a data-hovercard="/ajax/hovercard/hovercard.php?id=197438223941899" target="_blank" href="https://www.facebook.com/imaginBank/">imaginBank</a></span></div><div class="_8nrv"><div class="_4ik4 _4ik5" style="-webkit-line-clamp: 2;"><div><span class="_8jos">Patrocinado</span></div></div></div></div></div></div></div><div class="_7jyr"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 16px; max-height: 112px; -webkit-line-clamp: 7;"><div>Parking, peajes, impuestos, gasolina... Al final termina siendo una pasta. ¿Te has planteado recortar estos gastos? No, no hablamos de abandonar la conducción. Hablamos de enchufarnos al futuro. Conoce todos los beneficios de tener un coche un eléctrico y lo fácil que es conseguirlo con un Préstamo Auto de imaginBank. #Enchúfate<br> <br> *La concesión de la operación está sujeta al análisis de la solvencia y de la capacidad de devolución del solicitante, en función de las políticas de riesgo de la entidad. imaginBank de CaixaBank</div></div></div></div><div maxchangeamount="1" currentselectedindex="0" class="_23n-"><div class="_4u-c"><div index="0" class="_a28"><div class="_a2e"><div class="_2zgz"><div class="_7jy-"><div class="_7jyr"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 16px; max-height: 112px; -webkit-line-clamp: 7;"> </div></div></div><a target="_blank" class="_231w _231z _4yee" href="http://play.google.com/store/apps/details?id=com.imaginbank.app" style="color: rgb(33, 111, 219);"><img class="_7jys _7jyt img" src="https://scontent.flis8-2.fna.fbcdn.net/v/t39.16868-6/s600x600/68872437_623249314832062_3424786237267902464_n.jpg?_nc_cat=107&amp;_nc_oc=AQnCOg6lOVmyYNmKW9TeJMIQqFnp__ENhA6b0IF9n6OOvKhuFdfBFFn5A-i6mv9Qs9A&amp;_nc_ht=scontent.flis8-2.fna&amp;_nc_tp=7&amp;oh=4c316693ef6d41f08a19c047bbef6ff5&amp;oe=5EC0B4DA" alt=""><div class="_8jgz _8jg_"><div class="_8jh1"><div class="_8jh2"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 14px; max-height: 28px; -webkit-line-clamp: 2;">Préstamo desde 3.000€ hasta 30.000€. Solicita el tuyo
  desde la app</div></div></div><div class="_8jh3"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 14px; max-height: 28px; -webkit-line-clamp: 2;"></div></div></div><div class="_8jh4"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 12px; max-height: 24px; -webkit-line-clamp: 2;"></div></div></div><div class="_8jh5"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 12px; max-height: 24px; -webkit-line-clamp: 2;"></div></div></div></div><div class="_8jh0"><button type="button" aria-disabled="false" class="_271k _271m _1qjd _3-9a" style="max-width: 80px; letter-spacing: normal; color: rgb(68, 73, 80); font-size: 11px; font-weight: normal; font-family: Arial, sans-serif; line-height: 16px; text-align: center; background-color: rgb(245, 246, 247); border-color: rgb(218, 221, 225); height: 18px; padding-left: 4px; padding-right: 4px; background-clip: padding-box;"><div class="_43rl"><div data-hover="tooltip" data-tooltip-display="overflow" class="_43rm">Use App</div></div></button></div></div></a></div></div><div class="_2zgz"><div class="_7jy-"><div class="_7jyr"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 16px; max-height: 112px; -webkit-line-clamp: 7;"> </div></div></div><a target="_blank" class="_231w _231z _4yee" href="http://play.google.com/store/apps/details?id=com.imaginbank.app" style="color: rgb(33, 111, 219);"><img class="_7jys _7jyt img" src="https://scontent.flis8-1.fna.fbcdn.net/v/t39.16868-6/s600x600/69107399_623249321498728_5143385648069083136_n.jpg?_nc_cat=110&amp;_nc_oc=AQlHwBVTCf9XcxXVP4VH0YnbwivUgg1PXA8uYOxShCkbr9woauh1CiNiQTJbguBYmbc&amp;_nc_ht=scontent.flis8-1.fna&amp;_nc_tp=7&amp;oh=e52db30189225e3525fbef0cec013c31&amp;oe=5EFC734A" alt=""><div class="_8jgz _8jg_"><div class="_8jh1"><div class="_8jh2"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 14px; max-height: 28px; -webkit-line-clamp: 2;">Préstamo desde 3.000€ hasta 30.000€. Solicita el tuyo
  desde la app</div></div></div><div class="_8jh3"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 14px; max-height: 28px; -webkit-line-clamp: 2;"></div></div></div><div class="_8jh4"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 12px; max-height: 24px; -webkit-line-clamp: 2;"></div></div></div><div class="_8jh5"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 12px; max-height: 24px; -webkit-line-clamp: 2;"></div></div></div></div><div class="_8jh0"><button type="button" aria-disabled="false" class="_271k _271m _1qjd _3-9a" style="max-width: 80px; letter-spacing: normal; color: rgb(68, 73, 80); font-size: 11px; font-weight: normal; font-family: Arial, sans-serif; line-height: 16px; text-align: center; background-color: rgb(245, 246, 247); border-color: rgb(218, 221, 225); height: 18px; padding-left: 4px; padding-right: 4px; background-clip: padding-box;"><div class="_43rl"><div data-hover="tooltip" data-tooltip-display="overflow" class="_43rm">Use App</div></div></button></div></div></a></div></div><div class="_2zgz"><div class="_7jy-"><div class="_7jyr"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 16px; max-height: 112px; -webkit-line-clamp: 7;"> </div></div></div><a target="_blank" class="_231w _231z _4yee" href="http://play.google.com/store/apps/details?id=com.imaginbank.app" style="color: rgb(33, 111, 219);"><img class="_7jys _7jyt img" src="https://scontent.flis8-1.fna.fbcdn.net/v/t39.16868-6/s600x600/68744822_623249324832061_2903488387056926720_n.jpg?_nc_cat=109&amp;_nc_oc=AQkQBwWIk_gZ3WxbsRYe6kyjcJk0HU4XjUHDUQEP1diakZkjkk5Ng8U38gF9L3ZWaTI&amp;_nc_ht=scontent.flis8-1.fna&amp;_nc_tp=7&amp;oh=ef1d597a68349654821b7f2a7c730287&amp;oe=5EF8CBA5" alt=""><div class="_8jgz _8jg_"><div class="_8jh1"><div class="_8jh2"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 14px; max-height: 28px; -webkit-line-clamp: 2;">Préstamo desde 3.000€ hasta 30.000€. Solicita el tuyo
  desde la app</div></div></div><div class="_8jh3"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 14px; max-height: 28px; -webkit-line-clamp: 2;"></div></div></div><div class="_8jh4"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 12px; max-height: 24px; -webkit-line-clamp: 2;"></div></div></div><div class="_8jh5"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 12px; max-height: 24px; -webkit-line-clamp: 2;"></div></div></div></div><div class="_8jh0"><button type="button" aria-disabled="false" class="_271k _271m _1qjd _3-9a" style="max-width: 80px; letter-spacing: normal; color: rgb(68, 73, 80); font-size: 11px; font-weight: normal; font-family: Arial, sans-serif; line-height: 16px; text-align: center; background-color: rgb(245, 246, 247); border-color: rgb(218, 221, 225); height: 18px; padding-left: 4px; padding-right: 4px; background-clip: padding-box;"><div class="_43rl"><div data-hover="tooltip" data-tooltip-display="overflow" class="_43rm">Use App</div></div></button></div></div></a></div></div></div></div></div><a class="_32rk _32rh _1cy6" href="#"><div direction="forward" class="_10sf _5x5_"><div class="_5x6d"><div class="_3bwv _3bww"><div class="_3bwy"><div class="_3bwx"><i class="_3-8w img sp_JmF3rXGjoQG sx_77d801" alt=""></i></div></div></div></div></div></a></div></div></div><div class="_7kfi"><div class="_7kd5"></div><a class="_7kfh" data-testid="snapshot_footer_link" href="#"><span style="font-family: Arial, sans-serif; font-size: 13px; line-height: 17px; letter-spacing: normal; overflow-wrap: normal; text-align: left; color: rgb(24, 119, 242);">Ver Det""",
    """Ativo</div></div><div class="_7jwu">Começou a ser publicado a <span>23/10/2019</span></div><div class="_8jox"><div aria-describedby="js_p" aria-haspopup="true" class="_4rhp" role="tooltip" tabindex="0">Identificação: 712910935875237</div></div></div><div class="_8k-_"><div class="_3qn7 _61-0 _2fyi _3qng" style="max-width: 120px;"><span data-hover="tooltip"><i class="_3-8_ img sp_-Fn2d835eMD sx_39e484" alt=""></i></span><span data-hover="tooltip"><i class="_3-8_ img sp_-Fn2d835eMD sx_f3b669" alt=""></i></span><span data-hover="tooltip"><i class="img sp_-Fn2d835eMD sx_e31062" alt=""></i></span></div></div></div><div class="_7jwv"><div style="display: inline-block; width: auto;"><button aria-pressed="false" data-testid="SUIAbstractMenu/button" type="button" aria-disabled="false" class="_271k _271l _1o4e _271m _1qjd _7tvm _7tv2 _7tv4" style="width: auto; letter-spacing: normal; color: rgb(68, 73, 80); font-size: 12px; font-weight: bold; font-family: Arial, sans-serif; line-height: 26px; text-align: center; background-color: transparent; border-color: transparent; height: 28px; padding-left: 7px; padding-right: 7px; border-radius: 2px;"><div class="_43rl"><i aria-hidden="true" class="_271o img sp_6UxJZoFesmZ sx_e4448e" alt=""></i><span class="accessible_elem">Abrir menu pendente</span></div></button></div></div></div><div class="_7jwy"><div class="_7jyg _7jyh"><div class="_7k71"><div class="_8nsi _8nqp"><div class="_3qn7 _61-0 _2fyi _3qng" style="width: 100%;"><img alt="imaginBank" class="_8nqq img" src="https://scontent.flis8-1.fna.fbcdn.net/v/t1.6435-9/56757490_843111089374606_3751796641934344192_n.png?_nc_cat=105&amp;_nc_oc=AQn_sfVuUVpGuXh9Xew56gOSFzdktA5s1xfEWBMkYzLNQ6m8zdOZve6xFIzu7IOEJL0&amp;_nc_ht=scontent.flis8-1.fna&amp;oh=ce8b3a3feea1162bc0874c260ab2b308&amp;oe=5EC4B78C"><div class="_3qn7 _61-0 _2fyh _3qnf" style="width: 100%;"><div class="_8nqr _3qn7 _61-3 _2fyi _3qng"><span style="font-family: Arial, sans-serif; font-size: 12px; line-height: 16px; letter-spacing: normal; font-weight: bold; overflow-wrap: normal; text-align: left; color: rgb(28, 30, 33);"><a data-hovercard="/ajax/hovercard/hovercard.php?id=197438223941899" target="_blank" href="https://www.facebook.com/imaginBank/">imaginBank</a></span></div><div class="_8nrv"><div class="_4ik4 _4ik5" style="-webkit-line-clamp: 2;"><div><span class="_8jos">Patrocinado</span></div></div></div></div></div></div></div><div class="_7jyr"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 16px; max-height: 112px; -webkit-line-clamp: 7;"><div>Ya reciclas, vas con tu botella de agua para rellenar y has dejado de usar bolsas de plástico para hacer la compra. ¿Qué sigue? Un coche eléctrico. Entérate por qué molan tanto y #Enchúfate con un Préstamo Auto de imaginBank para hacerte con tu coche eléctrico o híbrido.<br> <br> *La concesión de la operación está sujeta al análisis de la solvencia y de la capacidad de devolución del solicitante, en función de las políticas de riesgo de la entidad. imaginBank de CaixaBank</div></div></div></div><div maxchangeamount="1" currentselectedindex="0" class="_23n-"><div class="_4u-c"><div index="0" class="_a28"><div class="_a2e"><div class="_2zgz"><div class="_7jy-"><div class="_7jyr"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 16px; max-height: 112px; -webkit-line-clamp: 7;"> </div></div></div><a target="_blank" class="_231w _231z _4yee" href="http://play.google.com/store/apps/details?id=com.imaginbank.app" style="color: rgb(33, 111, 219);"><img class="_7jys _7jyt img" src="https://scontent.flis8-1.fna.fbcdn.net/v/t39.16868-6/s600x600/69012922_678822735864357_7122692236617187328_n.jpg?_nc_cat=104&amp;_nc_oc=AQketjlSUFzRGTqej50cs1XsD1InX5WgLsjHTd4mL6OWT7-OhrJXFvcz8WyRuRBSsqM&amp;_nc_ht=scontent.flis8-1.fna&amp;_nc_tp=7&amp;oh=cbfe1cf2d19b5c37ede1b2dfdf674276&amp;oe=5EBC43D4" alt=""><div class="_8jgz _8jg_"><div class="_8jh1"><div class="_8jh2"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 14px; max-height: 28px; -webkit-line-clamp: 2;">Préstamo desde 3.000€ hasta 30.000€ ¡Solicítalo desde la app!</div></div></div><div class="_8jh3"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 14px; max-height: 28px; -webkit-line-clamp: 2;"></div></div></div><div class="_8jh4"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 12px; max-height: 24px; -webkit-line-clamp: 2;"></div></div></div><div class="_8jh5"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 12px; max-height: 24px; -webkit-line-clamp: 2;"></div></div></div></div><div class="_8jh0"><button type="button" aria-disabled="false" class="_271k _271m _1qjd _3-9a" style="max-width: 80px; letter-spacing: normal; color: rgb(68, 73, 80); font-size: 11px; font-weight: normal; font-family: Arial, sans-serif; line-height: 16px; text-align: center; background-color: rgb(245, 246, 247); border-color: rgb(218, 221, 225); height: 18px; padding-left: 4px; padding-right: 4px; background-clip: padding-box;"><div class="_43rl"><div data-hover="tooltip" data-tooltip-display="overflow" class="_43rm">Use App</div></div></button></div></div></a></div></div><div class="_2zgz"><div class="_7jy-"><div class="_7jyr"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 16px; max-height: 112px; -webkit-line-clamp: 7;"> </div></div></div><a target="_blank" class="_231w _231z _4yee" href="http://play.google.com/store/apps/details?id=com.imaginbank.app" style="color: rgb(33, 111, 219);"><img class="_7jys _7jyt img" src="https://scontent.flis8-2.fna.fbcdn.net/v/t39.16868-6/s600x600/68897058_678822749197689_3660284254794809344_n.jpg?_nc_cat=102&amp;_nc_oc=AQnuGGaDQSJvqp6qWgRPMeQJ5mGectLDp8RrAPgACaUxLzaXjGrN6r0SaQUAWU7Io_g&amp;_nc_ht=scontent.flis8-2.fna&amp;_nc_tp=7&amp;oh=ab5a43134d95ebef545ff39420bf7f9c&amp;oe=5EBEB413" alt=""><div class="_8jgz _8jg_"><div class="_8jh1"><div class="_8jh2"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 14px; max-height: 28px; -webkit-line-clamp: 2;">Préstamo desde 3.000€ hasta 30.000€. Solicita el tuyo
  desde la app</div></div></div><div class="_8jh3"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 14px; max-height: 28px; -webkit-line-clamp: 2;"></div></div></div><div class="_8jh4"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 12px; max-height: 24px; -webkit-line-clamp: 2;"></div></div></div><div class="_8jh5"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 12px; max-height: 24px; -webkit-line-clamp: 2;"></div></div></div></div><div class="_8jh0"><button type="button" aria-disabled="false" class="_271k _271m _1qjd _3-9a" style="max-width: 80px; letter-spacing: normal; color: rgb(68, 73, 80); font-size: 11px; font-weight: normal; font-family: Arial, sans-serif; line-height: 16px; text-align: center; background-color: rgb(245, 246, 247); border-color: rgb(218, 221, 225); height: 18px; padding-left: 4px; padding-right: 4px; background-clip: padding-box;"><div class="_43rl"><div data-hover="tooltip" data-tooltip-display="overflow" class="_43rm">Use App</div></div></button></div></div></a></div></div><div class="_2zgz"><div class="_7jy-"><div class="_7jyr"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 16px; max-height: 112px; -webkit-line-clamp: 7;"> </div></div></div><a target="_blank" class="_231w _231z _4yee" href="http://play.google.com/store/apps/details?id=com.imaginbank.app" style="color: rgb(33, 111, 219);"><img class="_7jys _7jyt img" src="https://scontent.flis8-2.fna.fbcdn.net/v/t39.16868-6/s600x600/68874914_678822752531022_5734270725913575424_n.jpg?_nc_cat=108&amp;_nc_oc=AQkv8fSf75LF4KE4JbneYEcjKyFRw7xil-Nq6Q_rhP_qvoe04zH3ZUa4SwRvS8Nq0XU&amp;_nc_ht=scontent.flis8-2.fna&amp;_nc_tp=7&amp;oh=c5ad235fa78dd110248f46bd3913f998&amp;oe=5EF66280" alt=""><div class="_8jgz _8jg_"><div class="_8jh1"><div class="_8jh2"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 14px; max-height: 28px; -webkit-line-clamp: 2;">Préstamo desde 3.000€ hasta 30.000€. Solicita el tuyo
  desde la app</div></div></div><div class="_8jh3"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 14px; max-height: 28px; -webkit-line-clamp: 2;"></div></div></div><div class="_8jh4"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 12px; max-height: 24px; -webkit-line-clamp: 2;"></div></div></div><div class="_8jh5"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 12px; max-height: 24px; -webkit-line-clamp: 2;"></div></div></div></div><div class="_8jh0"><button type="button" aria-disabled="false" class="_271k _271m _1qjd _3-9a" style="max-width: 80px; letter-spacing: normal; color: rgb(68, 73, 80); font-size: 11px; font-weight: normal; font-family: Arial, sans-serif; line-height: 16px; text-align: center; background-color: rgb(245, 246, 247); border-color: rgb(218, 221, 225); height: 18px; padding-left: 4px; padding-right: 4px; background-clip: padding-box;"><div class="_43rl"><div data-hover="tooltip" data-tooltip-display="overflow" class="_43rm">Use App</div></div></button></div></div></a></div></div></div></div></div><a class="_32rk _32rh _1cy6" href="#"><div direction="forward" class="_10sf _5x5_"><div class="_5x6d"><div class="_3bwv _3bww"><div class="_3bwy"><div class="_3bwx"><i class="_3-8w img sp_JmF3rXGjoQG sx_77d801" alt=""></i></div></div></div></div></div></a></div></div></div><div class="_7kfi"><div class="_7kd5"></div><a class="_7kfh" data-testid="snapshot_footer_link" href="#"><span style="font-family: Arial, sans-serif; font-size: 13px; line-height: 17px; letter-spacing: normal; overflow-wrap: normal; text-align: left; color: rgb(24, 119, 242);">Ver Det"""
    ]

b = [
    """AtivoComeçou a ser publicado a 23/10/2019Identificação: 411753089755204Abrir menu pendenteimaginBankPatrocinadoParking, peajes, impuestos, gasolina... Al final termina siendo una pasta. ¿Te has planteado recortar estos gastos? No, no hablamos de abandonar la conducción. Hablamos de enchufarnos al futuro. Conoce todos los beneficios de tener un coche un eléctrico y lo fácil que es conseguirlo con un Préstamo Auto de imaginBank. #Enchúfate *La concesión de la operación está sujeta al análisis de la solvencia y de la capacidad de devolución del solicitante, en función de las políticas de riesgo de la entidad. imaginBank de CaixaBank Préstamo desde 3.000€ hasta 30.000€. Solicita el tuyo
  desde la appUse App Préstamo desde 3.000€ hasta 30.000€. Solicita el tuyo
  desde la appUse App Préstamo desde 3.000€ hasta 30.000€. Solicita el tuyo
  desde la appUse AppVer Det""",
    """AtivoComeçou a ser publicado a 23/10/2019Identificação: 712910935875237Abrir menu pendenteimaginBankPatrocinadoYa reciclas, vas con tu botella de agua para rellenar y has dejado de usar bolsas de plástico para hacer la compra. ¿Qué sigue? Un coche eléctrico. Entérate por qué molan tanto y #Enchúfate con un Préstamo Auto de imaginBank para hacerte con tu coche eléctrico o híbrido. *La concesión de la operación está sujeta al análisis de la solvencia y de la capacidad de devolución del solicitante, en función de las políticas de riesgo de la entidad. imaginBank de CaixaBank Préstamo desde 3.000€ hasta 30.000€ ¡Solicítalo desde la app!Use App Préstamo desde 3.000€ hasta 30.000€. Solicita el tuyo
  desde la appUse App Préstamo desde 3.000€ hasta 30.000€. Solicita el tuyo
  desde la appUse AppVer Det"""
]

df = pd.DataFrame({0: a, 1: b})


def get_links(x):
    soup = BeautifulSoup(x, parser='html.parser')
    links = [i.get('href') for i in soup.findAll('a', attrs={'href': re.compile("^http")})]
    return links

df[0].apply(get_links)

df[0].apply(get_links) returns

0    [https://www.facebook.com/imaginBank/, http://...
1    [https://www.facebook.com/imaginBank/, http://...
Name: 0, dtype: object

df[1].apply(get_links) returns

0    []
1    []
Name: 1, dtype: object

Upvotes: 1

Related Questions