Plot Yelp Halal Restaurants

Sun 24 December 2017 | tags: Python

Background and Overview

In my first 2 Python related posts I used techniques to scrape data from the web and turn that into information. In case you missed it check out my NFL Salary Scraping Part 1 and Part 2 where I show how to use requests, BeautifulSoup, Pandas, and Matplotlib to learn that the highest paid players in the NFL in 2017 are Larry Fitzgerald and Patrick Peterson.

This post is an evolution from those posts, instead of using scraping data I'm using the Yelp and Google maps APIs to collect information and present the results on a map with markers.

For context I used the Yelp Fusion API to peform a business search for term of 'halal'. Now by no means am I an expert on this topic, but the word halal mean permissable to eat and my wife and I frequent these restaurants so I thought it would be interesting to map those locations near our house.

#imports using json, gmaps, and pandas
import json, gmaps
import pandas as pd

Load Data

Admittedly to get the data from the Yelp API I used Postman instead of interfacing with the API directly. This was just faster to get a JSON file of the results I was looking for. To do this you need to create an app and get an API Key from Yelps Developer website. I saved the results from Postman into a text file and then worked with that file from there.

with open("Yelp_Halal_Businesses.JSON", 'r') as file:
    data = file.read()

json_data = json.loads(data)

Get Locations

In the cell below I iterate over the items in json_data and create 2 lists that store that latitude and longitude. I then zip those 2 lists together to create 1 so that I can pass the coordinates of the halal restaurants to gmaps.

lat = []
long = []
for item in json_data['businesses']:
    lat.append(item['coordinates']['latitude'])
    long.append(item['coordinates']['longitude'])

lat_long = list((zip(lat,long)))
lat_long
[(40.56378, -74.694418),
 (40.5661650002003, -74.6275533735752),
 (40.55492, -74.52609),
 (40.5036928033583, -74.6444269892345),
 (40.62008, -74.49021),
 (40.55527, -74.52661),
 (40.598919, -74.480728),
 (40.437622, -74.538564),
 (40.6171176417404, -74.494637063977),
 (40.61933, -74.4927699),
 (40.4340821413289, -74.5478186003373),
 (40.62008, -74.49021),
 (40.4473744, -74.4966087),
 (40.4974297303228, -74.4482234326057),
 (40.4379627796526, -74.5361950967858),
 (40.5104061, -74.409281),
 (40.4345526013837, -74.5460039343956),
 (40.4977595713994, -74.4490539963252),
 (40.5721031, -74.3360702),
 (40.4992091, -74.4272637)]

Test the Google Maps API

The code snippet below is a copy/paste from the gmaps documentation. I did this to make sure I had the Google API Key setup correctly.

The information in gmaps documenation is really quite good. Check it out jupyter-gmaps.pdf

gmaps.configure(api_key="AI...")
marker_locations = [
(-34.0, -59.166672),
(-32.23333, -64.433327),
(40.166672, 44.133331),
(51.216671, 5.0833302),
(51.333328, 4.25)
]
fig = gmaps.figure()
markers = gmaps.marker_layer(marker_locations)
fig.add_layer(markers)
fig
A Jupyter Widget
%%HTML
<img src="map_1.png" />

Plot the locations

from ipywidgets.embed import embed_minimal_html

fig = gmaps.figure()
markers = gmaps.marker_layer(lat_long)
fig.add_layer(markers)

embed_minimal_html('export.html', views=[fig])
%%HTML
<img src="map_2.png" />

Conclusion

The map visualization is great. I really enjoy maps, but to really find that hidden treasure in the data you need to know your data and ask questions. By asking my data set what the restaurants had greater than 4 stars, where 2 dollar signs, and had more than 50 reviews I found a restauarant I never heard of before. My wife and I will be checking out this place soon enough!

json_data.keys()
df = pd.DataFrame(json_data['businesses'])
df.columns
Index(['categories', 'coordinates', 'display_phone', 'distance', 'id',
       'image_url', 'is_closed', 'location', 'name', 'phone', 'price',
       'rating', 'review_count', 'transactions', 'url'],
      dtype='object')
df_final = df[['name','phone','price','rating','review_count','url']]
df_final[(df_final['rating'] > 4) & (df_final['price']== '$$') & (df_final['review_count'] > 50)]
name phone price rating review_count url
4 4 Brothers Breakfast +19088348889 $$ 4.5 183 https://www.yelp.com/biz/4-brothers-breakfast-...

blogroll

social