Reputation: 35

Check data from a list against a CSV Python

Hi I am new to python and I am trying to increase my knowldge by making a usable function. I am trying to build a function that creates a list of 6 random numbers taken from a set of numbers in a range of 1 to 59. Now I've cracked that part it's the next part that is tricky. I now want to check a csv file for the numbers in the random set and then print out a notification if two or more numbers are found from that set. Now I have tried print (df[df[0:].isin(luckyDip)]) with a little bit of success in that it checks the data frame for the numbers in the set and then shows the numbers that match in the data frame BUT it also show the rest of the data frame as NaN, this is not very technically pleasing and not really what I want.

Im just looking for some pointers on what to do next or just what to search google for, bellow is the code I've been messing about with.

import random
import pandas as pd

url ='https://www.national-lottery.co.uk/results/euromillions/draw-history/csv'
df = pd.read_csv(url,   sep=',', na_values=".")

lottoNumbers = [1,2,3,4,5,6,7,8,9,10,
           11,12,13,14,15,16,17,18,19,20,
           21,22,23,24,25,26,27,28,29,30,
           31,32,33,34,35,36,37,38,39,40,
           41,42,43,44,45,46,47,48,49,50,
           51,52,53,54,55,56,57,58,59]
luckyDip = random.sample(lottoNumbers, k=6) #Picks 6 numbers at random
print (sorted(luckyDip))    
print  (df[df[0:].isin(luckyDip)])

Upvotes: 1

Answers (3)

Troy D

Reputation: 2245

You can add to what you have by counting the notnull values in each row. Then display the rows where the matches are greater or equal to 2.

match_count = df[df[0:].isin(luckyDip)].notnull().sum(axis=1)
print(match_count[match_count >= 2])

This gives you the index value of the matching row and the number of matches.

Example output:

If you also want the matching values from these rows, you can add:

index = match_count[match_count >= 2].index
matches = [tuple(x[~pd.isnull(x)]) for x in df.loc[index][df[0:].isin(luckyDip)].values]
print(matches)

Example output:

[(19.0, 23.0), (19.0, 41.0), (19.0, 23.0, 34.0), (23.0, 28.0)]

Upvotes: 0

RicLeal

Reputation: 951

Not as elegant as @ayhan solution but this works:

import random
import pandas as pd

url ='https://www.national-lottery.co.uk/results/euromillions/draw-history/csv'
df = pd.read_csv(url, index_col=0,  sep=',')

lottoNumbers = range(1, 60)

tries = 0
while True:
    tries+=1
    luckyDip = random.sample(lottoNumbers, k=6) #Picks 6 numbers at random

    # subset of balls
    draws = df.iloc[:,0:7]

    # True where there is match
    matches = draws.isin(luckyDip)

    # Gives the sum of Trues
    sum_of_trues = matches.sum(1)

    # you are looking for matches where sum_of_trues is 6
    final = sum_of_trues[sum_of_trues == 6]
    if len(final) > 0:
        print("Took", tries)
        print(final)
        break

The result is something like this:

Took 15545
DrawDate
16-May-2017    6
dtype: int64

Upvotes: 1

user2188329

Reputation: 103

If you're just looking to flatten the array and remove nan values you can add this to the end of your code:

    matches = df[df[0:].isin(luckyDip)].values.flatten().astype(np.float64)
    print matches[~np.isnan(matches)]

Upvotes: 0

Check data from a list against a CSV Python

Answers (3)

Related Questions