While fake data provider exists for ages, with any kind of data provided along with your choice of data format, it is not usual to generate our own dataset in the runtime. The fake data is used to speed up the development process. Instead of waiting for the real data which may be slow and hard to obtain, why don’t we use the fake data instead for now, and replace it later with the real one later? Fake it till you make it they said.

Introducing Faker library

Faker is a Python library that allows you to use fake data to bootstrap your database, fill your persistence to stress test it, and maybe you can use it to sign up for a bogus website. You can install it with:

pip install Faker

Name, address, phone number, or even credit card number, you can fake it with:

from faker import Faker
fake = Faker()

fake.name()
# 'Christian Hall'

fake.address()
# '70019 Robert Freeway Apt. 659\nMullinsburgh, CA 50585'

fake.phone_number()
# '+1-474-012-7514x76812'

fake.credit_card_number()
# '3582590448507517'

Or even better, create complete imaginary person info:

fake.profile()


# {'job': 'Chiropodist',
#  'company': 'Smith-Diaz',
#  'ssn': '001-30-6033',
#  'residence': '63871 Martinez Extensions Apt. 293\nEast Zachary, WY 07454',
#  'current_location': (Decimal('-58.6195145'), Decimal('-146.688835')),
#  'blood_group': 'B+',
#  'website': ['https://scott.biz/'],
#  'username': 'peterpetersen',
#  'name': 'Amy Richardson',
#  'sex': 'F',
#  'address': '23094 Sylvia Bypass\nWilliamsburgh, AK 77555',
#  'mail': 'sarah82@gmail.com',
#  'birthdate': datetime.date(1956, 8, 30)}

The good news is, although the available selection of data is pretty large 1, Faker even supports community providers 2.

Generate fake data in CSV format

Now we want to create a complete dataset of fake persons. We will use this dataset later in other posts in Antardata. To do so, we will utilize the Python CSV library 3.

import csv
n_line = 10
with open('people.csv', 'w') as csvfile:
    fieldnames = fake.profile().keys()
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()
    for i in range(n_line):
        writer.writerow(fake.profile())

If you run that code above, you will find “people.csv” in your current file directory along with your Python code. But if you open it with Microsoft Excel or a similar program, you will find that it is a mess, since the column value may contain newlines.

img
Figure 1: Yuck! This kind of data triggers my OCD.

Worry not! If we read the file with Python or other proper tools, this will be fine:

import csv
with open('people.csv', newline='') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        print(row)

# {'job': 'Civil Service fast streamer', 'company': 'Sanders, Cardenas and Daniels', 'ssn': '314-97-6382', 'residence': '7213 Heath Glens Suite 041\r\nFriedmanfort, UT 25348', 'current_location': "(Decimal('64.332196'), Decimal('117.320874'))", 'blood_group': 'A-', 'website': "['https://hernandez.net/', 'https://www.coleman-zamora.info/', 'https://www.wong.biz/', 'http://www.rhodes-curtis.com/']", 'username': 'joshua10', 'name': 'Sherri Meyer', 'sex': 'F', 'address': '542 Lawrence Mountains\r\nLake Sara, MT 55100', 'mail': 'ajefferson@gmail.com', 'birthdate': '1931-02-09'}
# {'job': 'Senior tax professional/tax inspector', 'company': 'Moore-Rangel', 'ssn': '538-77-2206', 'residence': 'USCGC Simpson\r\nFPO AP 79846', 'current_location': "(Decimal('74.7142165'), Decimal('-85.150605'))", 'blood_group': 'AB+', 'website': "['http://boyer.com/']", 'username': 'deborahrodriguez', 'name': 'Erik Romero', 'sex': 'M', 'address': '410 Justin Divide Apt. 165\r\nGaryhaven, NM 05095', 'mail': 'david29@hotmail.com', 'birthdate': '1943-08-14'}
# {'job': 'Research scientist (physical sciences)', 'company': 'Gibbs PLC', 'ssn': '272-10-3625', 'residence': '407 Duran Path\r\nJonesshire, RI 59379', 'current_location': "(Decimal('24.100642'), Decimal('-89.574301'))", 'blood_group': 'AB-', 'website': "['http://conrad-chavez.com/', 'http://www.ross.com/', 'https://www.johns.com/', 'http://wolfe.com/']", 'username': 'ulutz', 'name': 'Brian Ellis', 'sex': 'M', 'address': '0599 Thompson Passage Suite 308\r\nWeissfurt, NJ 51425', 'mail': 'scook@gmail.com', 'birthdate': '1909-01-02'}
# {'job': 'Health and safety adviser', 'company': 'Johnson and Sons', 'ssn': '769-06-0413', 'residence': '5319 Rhonda Wells Suite 364\r\nEast Stephanie, NV 30478', 'current_location': "(Decimal('48.3546345'), Decimal('22.146956'))", 'blood_group': 'AB+', 'website': "['https://cobb.org/', 'https://morris.com/', 'http://kaiser.net/']", 'username': 'maria54', 'name': 'Tyler Wolf', 'sex': 'M', 'address': '143 Jeffrey Valley Apt. 707\r\nEast Gregorychester, ME 58825', 'mail': 'angela57@gmail.com', 'birthdate': '1964-10-02'}
# {'job': 'Information systems manager', 'company': 'Freeman-Zamora', 'ssn': '005-95-7939', 'residence': '2701 Ritter Fork Apt. 595\r\nWest Mitchell, IL 16776', 'current_location': "(Decimal('18.502680'), Decimal('138.452795'))", 'blood_group': 'A+', 'website': "['https://www.hinton.com/', 'https://www.meadows.com/']", 'username': 'taylor32', 'name': 'Benjamin Fox', 'sex': 'M', 'address': '4763 Christine Way\r\nKempside, ND 56903', 'mail': 'robertjones@gmail.com', 'birthdate': '2020-05-09'}
# {'job': 'Chartered public finance accountant', 'company': 'Moore-Fuentes', 'ssn': '081-39-1939', 'residence': '9092 Johnson Roads\r\nKaitlynborough, ME 77234', 'current_location': "(Decimal('-53.919490'), Decimal('-125.242095'))", 'blood_group': 'B+', 'website': "['http://brown.com/', 'https://www.walker.org/', 'https://vaughn.com/', 'http://www.maddox.biz/']", 'username': 'ashley00', 'name': 'Helen Hutchinson', 'sex': 'F', 'address': '1319 Steve Valley Apt. 695\r\nEast Amy, OR 32465', 'mail': 'danielmoore@yahoo.com', 'birthdate': '1916-03-24'}
# {'job': 'Corporate investment banker', 'company': 'Riley-Greene', 'ssn': '177-64-7010', 'residence': 'Unit 3725 Box 8849\r\nDPO AE 54328', 'current_location': "(Decimal('57.9757505'), Decimal('178.662366'))", 'blood_group': 'AB-', 'website': "['http://www.robbins-smith.com/', 'http://bishop.com/']", 'username': 'erikclarke', 'name': 'Eric Wagner', 'sex': 'M', 'address': '98387 Reed Squares\r\nMcdanielview, MN 15602', 'mail': 'jessica85@gmail.com', 'birthdate': '1916-04-10'}
# {'job': 'Historic buildings inspector/conservation officer', 'company': 'Jones, Reed and Peterson', 'ssn': '531-39-6138', 'residence': '9243 Nguyen Lock\r\nChristianview, NJ 95643', 'current_location': "(Decimal('-17.749271'), Decimal('171.491715'))", 'blood_group': 'B-', 'website': "['http://www.fleming-mendoza.biz/', 'https://www.gallagher.com/', 'http://stone.com/', 'https://newman.com/']", 'username': 'uorr', 'name': 'Rhonda Adkins', 'sex': 'F', 'address': '624 Scott Mills Suite 221\r\nSouth Paulafurt, AK 20507', 'mail': 'jonesadam@hotmail.com', 'birthdate': '1941-11-25'}
# {'job': 'Technical brewer', 'company': 'Hopkins, James and Smith', 'ssn': '840-47-5731', 'residence': '91736 Phillips Burg Apt. 417\r\nHesschester, MA 70946', 'current_location': "(Decimal('-30.4801635'), Decimal('54.511119'))", 'blood_group': 'O-', 'website': "['http://adkins-holt.biz/']", 'username': 'smithbrenda', 'name': 'Robert Moore', 'sex': 'M', 'address': '6294 Summers Highway\r\nNorth Christopher, IN 56135', 'mail': 'mark61@hotmail.com', 'birthdate': '2011-01-17'}
# {'job': 'Insurance broker', 'company': 'Medina-Wright', 'ssn': '593-98-4869', 'residence': '13030 Cox River Apt. 780\r\nFergusonhaven, AR 15684', 'current_location': "(Decimal('-36.753118'), Decimal('-124.476044'))", 'blood_group': 'B-', 'website': "['http://carney.com/']", 'username': 'robinsonroger', 'name': 'Michael Little', 'sex': 'M', 'address': '48237 Meghan Valley\r\nEast Theodoremouth, HI 62861', 'mail': 'wmiller@gmail.com', 'birthdate': '1968-07-10'}

Code above is a sufficient explanation of how to read the CSV file, but if take a look at this post below to get in-depth:

POST_PLACEHOLDER

References


  1. https://faker.readthedocs.io/en/stable/providers.html ↩︎

  2. https://faker.readthedocs.io/en/stable/communityproviders.html ↩︎

  3. https://docs.python.org/3/library/csv.html ↩︎