Reputation: 47
I have a folder with full of excel files and subfolders.Subfolders also contain excel files.
Using os.walk() i have been able to go through each subfolder and get the data from that excel file.My problem is , i want to take only mobile numbers column from all excel files and store them in database. Issue with that is , column name for mobiles number is different in each excel file and there are thousand excel files.
So i decided to get columns by its datatypes and length. (e.g. if column datatype is integer and length of each cell in that particular column is 10 then i would push that data.)
But i don't understand how to do this. I looped though columns and rows but all i got is repeatative data or error. Can someone please help me out with this ? Would be glad if you helps cause i am doing this since two days but failing. Thanks in advance.
def file_access(file_path_list):
for path in file_path_list:
if path.endswith(('xlsx' , 'xls' , 'XLS')):
print '------------------------------\n\n'
folder = path.split('/')[-2]
sheet = path.split('/')[-1]
print folder , sheet
df = pd.read_excel(path, sheet_name=None, header=None)
# Here i am trying to get data from df object but failing to do so.
# You can suggest me code from this stage.
Upvotes: 0
Views: 1208
Reputation: 648
To obtain just phone number in many excel files you can try the following code:
import pandas as pd
import xlrd
import os
mydir = (os.getcwd()).replace('\\','/') + '/'
#Get all excel files include subdir
filelist=[]
for path, subdirs, files in os.walk(mydir):
for file in files:
if (file.endswith('.xlsx') or file.endswith('.xls') or file.endswith('.XLS')):
filelist.append(os.path.join(path, file))
number_of_files=len(filelist)
# Get data of cells from excel
data=[]
for i in range(number_of_files):
#df.append(pd.read_excel(r''+ mydir +filelist[i]))
df=pd.read_excel(r''+filelist[i])
l=len(df.iloc[0])
for n in range(l):
if len(str(df.iloc[0][n])) >= 10:
data.append(df.iloc[:][df.axes[1][n]])
break
res=[]
for i in range(len(data)):
res.append(data[i].values.tolist())
print(res)
To get all phone data from different column's name, it's used the length of phone number to differentiate to other column data. For this case i has used the length of phone number used in my country that is 11 in length (ex: 82330403045).
Output:
>>> data
[0 82330403045
1 82330403046
2 82330403047
3 82330403048
Name: Phone, dtype: int64, 0 82330403049
1 82330403050
2 82330403051
3 82330403052
Name: PhoneCell, dtype: int64]
>>> res
[[82330403045, 82330403046, 82330403047, 82330403048], [82330403049, 82330403050, 82330403051, 82330403052], [82330403049, 82330403050, 82330403051, 82330403052], [82330403045, 82330403046, 82330403047, 82330403048], [82330403049, 82330403050, 82330403051, 82330403052], [82330403049, 82330403050, 82330403051, 82330403052]]
You can use this output data for your database.
My Excel files:
Book1.xlsx file:
Book2.xlsx file:
Upvotes: 1