Reputation: 1
I am reading the data from a .dat file
And here's an example of what the dataset looks like
38 39 41 109 110
39 111 112 113 114 115 116 117 118
119 120 121 122 123 124 125 126 127 128 129 130 131 132 133
48 134 135 136
39 48 137 138 139 140 141 142 143 144 145 146 147 148 149
What I'm trying to do is to read the data file and get a random row from it like
119 120 121 122 123 124 125 126 127 128 129 130 131 132 133
I've been doing this:
data_url = "someurl.dat"
market_basket = pd.read_csv(data_url, header=None, delimiter='\n+', engine="python")
sample = market_basket.sample(n=1)
But when I output the value of sample, this is what I get:
0
40911 39 2787 2858 5016 5041 13569
Moreso, when I look for the outputted row, I can't find it in my dataset why?
Upvotes: 0
Views: 282
Reputation: 149075
This is a pandas variation on Rafaël's answer.
Pandas read_csv
can read one single line from a file, thanks to the skiprows and nrows parameters. The hard part is in fact how to find a random line number...
So a simple way is to read all lines from the input file, choose a random one and feed that single line into the dataframe:
import pandas as pd
import random
import io
with open("someurl.dat") as fd:
line = random.choice(fd.readlines)
df = pd.read_csv(io.StringIO(line), sep='\s+', header=None)
BTW, your code cannot give you the expected dataframe. With
market_basket = pd.read_csv(data_url, header=None, delimiter='\n+', engine="python")
sample = market_basket.sample(n=1)
market_basket
is a DataFrame with one single columns containing the full lines, indexed by their line number in the file. So sample
is the 40911th line, containing 39 2787 2858 5016 5041 13569
. To parse it, you still need tp first extract the actual field (.iloc[0][0]
) and split it:
sample = pd.read_csv(io.StringIO(sample.iloc[0][0]), sep='\s+', header=None)
Upvotes: 1
Reputation: 409
Why the Pandas? Can you simply open the file with plain python?
Something like:
import random
with open(filename) as a:
data = a.read().splitlines()
line = random.choice(data)
Upvotes: 1