Ignoring multiple commas while reading csv in pandas

Question

I m trying to read multiple files whose names start with 'site_%'. Example, file names like site_1, site_a. Each file has data like :

Login_id, Web
1,http://www.x1.com
2,http://www.x1.com,as.php

I need two columns in my pandas df: Login_id and Web.

I am facing error when I try to read records like 2.

df_0 = pd.read_csv('site_1',sep='|')
df_0[['Login_id, Web','URL']] = df_0['Login_id, Web'].str.split(',',expand=True)

I am facing the following error : ValueError: Columns must be same length as key.

Please let me know where I am doing some serious mistake and any good approach to solve the problem. Thanks

jottbe · Accepted Answer

Solution 1: use split with argument n=1 and expand=True.

result= df['Login_id, Web'].str.split(',', n=1, expand=True)
result.columns= ['Login_id', 'Web']

That results in a dataframe with two columns, so if you have more columns in your dataframe, you need to concat it with your original dataframe (that also applies to the next method).

EDIT Solution 2: there is a nicer regex-based solution which uses a pandas function:

result= df['Login_id, Web'].str.extract('^\s*(?P[^,]*),\s*(?P.*)', expand=True)

This splits the field and uses the names of the matching groups to create columns with their content. The output is:

  Login_id                       URL
0        1         http://www.x1.com
1        2  http://www.x1.com,as.php

Solution 3: convetional version with regex: You could do something customized, e.g with a regex:

import re
sp_re= re.compile('([^,]*),(.*)')

aux_series= df['Login_id, Web'].map(lambda val: sp_re.match(val).groups())
df['Login_id']= aux_series.str[0]
df['URL']= aux_series.str[1]

The result on your example data is:

                Login_id, Web Login_id                       URL
0         1,http://www.x1.com        1         http://www.x1.com
1  2,http://www.x1.com,as.php        2  http://www.x1.com,as.php

Now you could drop the column 'Login_id, Web'.

Ignoring multiple commas while reading csv in pandas

Answers (1)

Related Questions