Reputation: 219
I have a cell type big-variable sorted out by FIRM (A(:,2)) and I want to erase all the rows in which the same firm doesn't appear at least 3 times in a row. In this example, A:
FIRM
1997 'ABDR' 0,56 464 1641 19970224
1997 'ABDR' 0,65 229 9208 19970424
1997 'ABDR' 0,55 125 31867 19970218
1997 'ABD' 0,06 435 8077 19970311
1997 'ABD' 0,00 150 44994 19970804
1997 'ABFI' 2,07 154 46532 19971209
I would keep only A:
1997 'ABDR' 0,56 464 1641 19970224
1997 'ABDR' 0,65 229 9208 19970424
1997 'ABDR' 0,55 125 31867 19970218
Thanks a lot.
Notes:
I used fopen
and textscan
to import the csv file.
I performed some changes on some variables for all of them to fit in a cell-type variable
I converted some number-elements into stings
F_x=num2cell(Data{:,x});
I got new variable just with year
F_ya=max(0,fix(log10(F_y)+1)-4);
F_yb=fix(F_y./10.^F_ya);
F_yc = num2cell(F_yb);
Create new cell A w/ variables I need
A=[F_5C Data{:,1} Data{:,2} Data{:,3} Data{:,4} F_xa F_xb];
Meaning that within the cell I have some variables that are strings and others that are numbers.
Upvotes: 0
Views: 243
Reputation: 104483
I'm going to assume that your names are stored in a cell
array. As such, your names would actually be:
names = {'ABDR', 'ABDR', 'ABDR', 'ABD', 'ABD', 'ABFI'};
We can then use strcmpi
. What this function does is that it string compares two strings together. It returns true
if the strings match and false
otherwise. This is also case insensitive, so ABDR
would be the same as abdr
.
You would call strcmpi
like so:
v = strcmpi(str1, str2);
Alternatively str2
can be a cell array. How this would work is that it would take a single string str1
and compare with each string in each cell of the cell array. It would then return a logical vector that is the same size as str2
which indicates whether we have a match at this particular location or not.
As such, we can go through each element of names
and see how many matches we have overall with the entire names
cell array. We can then figure out which locations we need to select by checking to see if we have at least 3 matches or more per name in the names
array. In other words, we simply sum up the logical vector for each string within names
and filter those that sum up to 3 or more. We can use cellfun
to help us perform this. As such:
sums = cellfun(@(x) sum(strcmpi(x,names)), names);
Doing this thus gives:
sums =
3 3 3 2 2 1
Now, we need those locations that have three or more. As such:
locations = sums >= 3
locations =
1 1 1 0 0 0
As such, these are the rows that you can use to filter out your matrix. This is also a logical vector. Assuming that A
contains your data, you would simply do A(locations,:)
to filter out all those rows that have occurrences of three or more times for a particular name. I really don't know how you constructed A
, so I'm assuming it's like a 2D matrix. If you put in the code that you used to construct this matrix, I'll modify my post to get it working for you. In any case, what's important is locations
. This tells you what rows you need to select to match your criteria.
Upvotes: 1