Reputation: 838
I m trying to read multiple files whose names start with 'site_%'. Example, file names like site_1, site_a. Each file has data like :
Login_id, Web
1,http://www.x1.com
2,http://www.x1.com,as.php
I need two columns in my pandas df: Login_id and Web.
I am facing error when I try to read records like 2.
df_0 = pd.read_csv('site_1',sep='|')
df_0[['Login_id, Web','URL']] = df_0['Login_id, Web'].str.split(',',expand=True)
I am facing the following error : ValueError: Columns must be same length as key.
Please let me know where I am doing some serious mistake and any good approach to solve the problem. Thanks
Upvotes: 1
Views: 541
Reputation: 4521
Solution 1: use split
with argument n=1
and expand=True
.
result= df['Login_id, Web'].str.split(',', n=1, expand=True)
result.columns= ['Login_id', 'Web']
That results in a dataframe with two columns, so if you have more columns in your dataframe, you need to concat it with your original dataframe (that also applies to the next method).
EDIT Solution 2: there is a nicer regex-based solution which uses a pandas function:
result= df['Login_id, Web'].str.extract('^\s*(?P<Login_id>[^,]*),\s*(?P<URL>.*)', expand=True)
This splits the field and uses the names of the matching groups to create columns with their content. The output is:
Login_id URL
0 1 http://www.x1.com
1 2 http://www.x1.com,as.php
Solution 3: convetional version with regex: You could do something customized, e.g with a regex:
import re
sp_re= re.compile('([^,]*),(.*)')
aux_series= df['Login_id, Web'].map(lambda val: sp_re.match(val).groups())
df['Login_id']= aux_series.str[0]
df['URL']= aux_series.str[1]
The result on your example data is:
Login_id, Web Login_id URL
0 1,http://www.x1.com 1 http://www.x1.com
1 2,http://www.x1.com,as.php 2 http://www.x1.com,as.php
Now you could drop the column 'Login_id, Web'.
Upvotes: 1