Reputation: 17
Here's what I have so far:
This is an example of the Dataframe:
A B C D
1 2 7 12 14
2 4 5 11 23
3 4 6 14 20
4 4 7 13 50
5 9 6 14 35
Here is an example of my efforts:
import time
import pandas as pd
then = time.time()
count = 0
df = pd.read_csv('Get_Numbers.csv')
df.columns = ['A', 'B', 'C', 'D']
while True:
df_elements = df.sample(n=1)
random_row = df_elements
print(random_row)
find_this_row = df['A','B','C','D' == '4','7','13,'50']
print(find_this_row)
if find_this_row != random_row:
count += 1
else:
break
print("You found the correct numbers! And it only took " + str(count) + " tries to get there! Your numbers were: " + str(find_this_row))
now = time.time()
print("It took: ", now-then, " seconds")
The above code gives an obvious error... but I have tried so many different versions now of finding the find_this_row
numbers that I just don't know what to do anymore, so I left this attempt in.
What I would like to try to avoid is using the specific index for the row I am trying to find, I would rather use just the values to find this.
I am using df_elements = df.sample(n=1)
to select a row at random. This was to avoid using random.choice
as I was not sure if that would work or which way is more time/memory efficient, but I'm open to advice on that as well.
In my mind it seems simple, randomly select a row of data, if it doesn't match the row of data that I want, keep randomly selecting rows of data until it does match. But I can't seem to execute it.
Any help is EXTREMELY Appreciated!
Upvotes: 1
Views: 1926
Reputation: 173
A couple of hints first. This line does not work for me:
find_this_row = df['A','B','C','D' == '4','7','13,'50']
For 2 reasons:
df['A','B','C','D' ...
Either use keys to return a DataFrame():
df[['A','B','C','D']]
or as a Series():
df['A']
Since you need the whole row with multiple columns do this:
df2.iloc[4].values
array(['4', '7', '13', '50'], dtype=object)
Do the same with your sample row:
df2.sample(n=1).values
Comparison between rows needs to be done for all() elements/columns:
df2.sample(n=1).values == df2.iloc[4].values
array([[ True, False, False, False]])
with adding .all() like the following:
(df2.sample(n=1).values == df2.iloc[4].values).all()
which returns
True/False
All together:
import time
import pandas as pd
then = time.time()
count = 0
while True:
random_row = df2.sample(n=1).values
find_this_row = df2.iloc[4].values
if (random_row == find_this_row).all() == False:
count += 1
else:
break
print("You found the correct numbers! And it only took " + str(count) + " tries to get there! Your numbers were: " + str(find_this_row))
now = time.time()
print("It took: ", now-then, " seconds")
Upvotes: 0
Reputation: 59579
Here's a method that tests one row at a time. We check if the values
of the chosen row are equal to the values of the sampled DataFrame
. We require that they all
match.
row = df.sample(1)
counter = 0
not_a_match = True
while not_a_match:
not_a_match = ~(df.sample(n=1).values == row.values).all()
counter+=1
print(f'It took {counter} tries and the numbers were\n{row}')
#It took 9 tries and the numbers were
# A B C D
#4 4 7 13 50
If you want to get a little bit faster, you select one row and then sample the DataFrame
with replacement many times. You can then check for the first time the sampled row equals your sampled DataFrame
, giving you how many 'tries' it would have taken in a while loop, but in much less time. The loop protects against the unlikely case we do not find a match, given that it's sampling with replacement.
row = df.sample(1)
n = 0
none_match = True
k = 10 # Increase to check more matches at once.
while none_match:
matches = (df.sample(n=len(df)*k, replace=True).values == row.values).all(1)
none_match = ~matches.any() # Determine if none still match
n += k*len(df)*none_match # Only increment if none match
n = n + matches.argmax() + 1
print(f'It took {n} tries and the numbers were\n{row}')
#It took 3 tries and the numbers were
# A B C D
#4 4 7 13 50
Upvotes: 0
Reputation: 423
You can use values which returns np.ndarray
of shape=(1, 2)
, use values[0]
to get just 1D array.
Then compare the arrays with any()
import time
import pandas as pd
then = time.time()
df = pd.DataFrame(data={'A': [1, 2, 3],
'B': [8, 9, 10]})
find_this_row = [2, 9]
print("Looking for: {}".format(find_this_row))
count = 0
while True:
random_row = df.sample(n=1).values[0]
print(random_row)
if any(find_this_row != random_row):
count += 1
else:
break
print("You found the correct numbers! And it only took " + str(count) + " tries to get there! Your numbers were: " + str(find_this_row))
now = time.time()
print("It took: ", now-then, " seconds")
Upvotes: 1
Reputation: 681
How about using values
?
values
will return you a list of values. And then you can compare two lists easily.
list1 == list2
will return an array of True
and False
values as it compares indexes of the corresponding lists. You can check if all of the values returned are True
Upvotes: 1