Reputation: 1765
I have a file with following contents:
UserID Email
1001 [email protected]
1001 [email protected]
1002 [email protected]
1002 [email protected]
I want to store the data like this:
ROW COLUMN+CELL
1001 column=cf:Email, timestamp=1487917201278, [email protected]
1001 column=cf:Email, timestamp=1487917201279, [email protected]
1002 column=cf:Email, timestamp=1487917201286, [email protected]
1002 column=cf:Email, timestamp=1487917201287, [email protected]
I am using Put
for example: put 'table', '1001', 'cf:Email', '[email protected]'
but it is giving me
ROW COLUMN+CELL
1001 column=cf:Email, timestamp=1487917201279, [email protected]
1002 column=cf:Email, timestamp=1487917201286, [email protected]
It is overriding the previous value. But HBase supposed to store multiple values for a particular column based on timestamp. Is there anyway that I can store both email addresses for particular UserID?
Upvotes: 1
Views: 1168
Reputation: 2155
You may want to take a closer look at the HBase documentation on versions. Note especially where it says
By default, i.e. if you specify no explicit version, when doing a
get
, the cell whose version has the largest value is returned
But I wouldn't pursue using multiple versions to store multiple values this way. You have to explicitly specify the maximum number of versions and it will apply to every column in that family. I would be more inclined to use distinct column names (such as Email1
, Email2
, ...)
Upvotes: 1
Reputation: 1403
You need to specify the number of versions for the "cf" column family. By default, the number of versions is 1. Do the following in HBase shell to modify existing table:
alter 'table', {NAME => 'cf', VERSIONS => 2147483647}
Read more about versions in HBase here.
Upvotes: 1