Reputation: 1021
My Elasticsearch index has more than 1000 fields due to my Sql schema and I get below exception:
{'type': 'illegal_argument_exception', 'reason': 'Limit of total fields [1000] in index }
And my bulk insert looks like this:
with open('audit1.txt') as file:
for line in file:
columns = line.split(r'||')
dict['TimeStamp']=columns[0].strip('\'')
dict['BusinessTimeStamp']=columns[1].strip('\'')
dict['RuntimeMicroflowID']=columns[2].strip('\'')
dict['MicroflowID']=columns[3].strip('\'')
dict['UserId']=columns[4].strip('\'')
dict['ClientId']=columns[5].strip('\'')
dict['Userlocation']=columns[6].strip('\'')
dict['Transactionid']=columns[7].strip('\'')
dict['Catagorie']=columns[8].strip('\'')
dict['EventType']=columns[9].strip('\'')
dict['Operation']=columns[10].strip('\'')
dict['PrimaryData']=columns[11].strip('\'')
dict['SecondayData']=columns[12].strip('\'')
i=13
while i < len(columns):
tempdict['BFOLDVALUE'] = columns[i+1].strip('\'')
tempdict['BFNEWVALUE'] = columns[i+2].strip('\'')
if columns[i].strip('\'') is not None:
dict[columns[i].strip('\'')] = tempdict.copy()
i+=3
tempdict.clear()
#print(json.dumps(dict,indent = 4))
batch.append(dict)
if counter==BATCHSIZE:
try:
helpers.bulk(es, batch, index='audit-index', doc_type='audit')
insertedrecords+=counter
counter = 0
batch.clear()
print(insertedrecords," - Records Has Been inserted ")
except BulkIndexError:
print("Error Occured -- continuing")
print(json.dumps(dict,indent = 4))
print(BulkIndexError)
batch.clear()
break
counter+=1
dict.clear()
So, I am assuming I am trying to index this wrongly... is there a better way of indexing this kind of formats in elasticsearch? Note than I am using ELK version 7.5.
Here is the sample file I am parsing to elasticsearch:
2018.07.17/15:41:53.735||2018.07.17/15:41:53.735||'0164a8424fbbp84h%2139165'||'BT_TTB_CashDep_PRC'||'eskedarz'||'UXP'||'00001039'||'0164a842e519pJpA'||'Persistence'||''||'CREATE'||'DailyTxns'||'0164a842e4eapJnu'||'CurrentThread'||'WebContainer : 15'||''||'ParentThread'||'system'||''||'TCPWorkerThreadID'||'WebContainer : 15'||''||'f_POSTINGDT'||'2018-07-17'||''||'versionNum'||'0'||''||'f_TXNAMTDR'||'0'||''||'f_ACCOUNTID'||'013XXXXXXXXX0'||''||'f_VALUEDTTM'||'2018-07-17 15:41:53.0'||''||'f_POSTINGDTTM'||'2018-07-17 15:41:53.692'||''||'f_TXNCLBAL'||'25551.610000'||''||'f_TXNREF'||'0000103917071815410685326'||''||'f_PIEVENTTYPE'||'N'||''||'f_TXNAMT'||'5000.00'||''||'f_TRANSACTIONID'||'0164a842e4e9pJng'||''||'f_TYPE'||'N'||''||'f_USERID'||'xxxarz'||''||'f_SRNO'||'1'||''||'f_TXNBASEEQ'||'5000.00'||''||'f_TXNSRCBRANCH'||'0000X039'||''||'f_TXNCODE'||'T08'||''||'f_CHANNELID'||'BranchTeller'||''||'f_TXNAMTCR'||'5000.00'||''||'f_TXNNARRATION'||'SELF '||''||'f_ISACCRUALPENDING'||'false'||''||'f_TXNDTTM'||'2018-07-17 15:41:53.689'||''
Upvotes: 1
Views: 8709
Reputation: 1021
better way to handle such exploding index is to normalize as RDBMS that means store some of the key : value combinations in a nested structure
example
{"keyA":"ValueA","keyB":"ValueB","keyC":"ValueC"...} - record to
{"keyA":"ValueA","Keyvalue":{"keyB":"ValueB"
"keyC":"ValueC"}} - record
so searching would look like Keyvalue.Value == KeyB and KeyValue.Value = ValueB
Upvotes: 0
Reputation: 32376
if you carefully look at this part of the error message it would be clear.
Limit of total fields [1000] in index
1000 is the default limit of total fields in the Elasticsearch index as shown in their source code.
public static final Setting<Long> INDEX_MAPPING_TOTAL_FIELDS_LIMIT_SETTING =
Setting.longSetting("index.mapping.total_fields.limit", 1000L, 0, Property.Dynamic, Property.IndexScope);
Please note this is a dynamic setting, hence can be changed on a given index, by updating index setting
PUT test_index/_settings
{
"index.mapping.total_fields.limit": 1500. --> changed it to what is suitable for your index.
}
More info on this issue can be found here and here.
Upvotes: 3