MongoDB代写-Trip Advisor

CS 336 Mini-project MongoDB Trip Advisor Fall 2018

The purpose of this assignment is to get familiar with a NoSQL database called MongoDB, store data in MongoDB, and query them by using MongoDB’s query language. At the end you will do the same in MySQL and you will write down your experience from using these two different in philosophy databases.

  1. Logistics

    The project is individual (no groups) and is due on Wednesday, December 12 at 11:55 pm on Sakai. You should submit a single document showing what you did and the results you obtained.

    You should be able to run a local instance of MongoDB and do all analysis locally, so no hosting, website, etc. should be necessary.

  2. MongoDB

    First download MongoDB:

    https://www.mongodb.com/download-center/community

    Then follow the installation instructions for your OS:

    MacOS https://docs.mongodb.com/manual/tutorial/install-mongodb-on-os-x/ Linux https://docs.mongodb.com/manual/administration/install-on-linux/ Windows https://docs.mongodb.com/manual/tutorial/install-mongodb-on-windows/

    After you have successfully installed and started MongoDB you will be able to use it. Some commands that may be useful:

    show dbs # Show all database instances in the database use cs336 # Create a new database instance

    Mind that after executing these you are under the cs336 database instance. Under the database instance, create two collections:

    db.createCollection(“reviews”) db.createCollection(“test”)

    If you use the command show collections, you can find two new collections have been created. You can check those two collections are all empty by using either the find() or count() command. The find() command will return all contents in the collection and count() will return the number of documents in the collection.

    For example: db.reviews.count() will return how many documents you have inside your reviews collection and if you replace count() with find() you will print the documents that exist in the collection. If you use the com- mand pretty() after the find() command, you will see your json documents prettified.

    The results are as follows:

    > db.createCollection(“reviews”)

    { “ok” : 1 }

    > db.createCollection(“test”)

    { “ok” : 1 }

    > db.reviews.count() 0

    > db.reviews.find().pretty()

    >

    After you have completed the installation of MongoDB, you be should be able to import the reviews file of tripAdvisor data.

  3. Trip Advisor data

    Download the data from Google Drive.

    The data has been cleaned (to an extent) for you, so you do not have to worry about it. The code used to clean up some parts is as below, you might want to modify it to get clean some other parts of the data

    1 db . r e v i e w s . f i n d ( ) . f o r E a c h ( f u n c t i o n ( doc ) {

    2 doc . R eview s . f o r E a c h ( f u n c t i o n ( r e v i e w ) {

    3 v a r n e w O v e r a l l = r e v i e w . R a t i n g s . O v e r a l l ;

    4 v a r O v e r a l l F l o a t = p a r s e F l o a t ( n e w O v e r a l l ) . t o F i x e d ( 1 ) ;

    5 r e v i e w . R a t i n g s . O v e r a l l = p a r s e F l o a t ( O v e r a l l F l o a t ) ;

    6

    7 i f ( r e v i e w . R a t i n g s . S e r v i c e ) {

    8 v a r n e w S e r v i c e = r e v i e w . R a t i n g s . S e r v i c e ;

    9 r e v i e w . R a t i n g s . S e r v i c e = p a r s e I n t ( n e w S e r v i c e ) ;

    10 }

    11

    12 i f ( r e v i e w . R a t i n g s . Value ) {

    13 v a r newValue = r e v i e w . R a t i n g s . Value ;

    14 r e v i e w . R a t i n g s . Value = p a r s e I n t ( newValue ) ;

    15 }

    16

    17 i f ( r e v i e w . Date ) {

    18 v a r newDate = r e v i e w . Date ;

    19 r e v i e w . Date = new Date ( newDate )

    20 }

    21 } )

    22 db . r e v i e w s . s a v e ( doc )

    23 } )

    Unzip the data reviews.tar.bz2 and import it into your MongoDB in- stance:

    mongoimport –db mydb –collection reviews –file reviews.json

    The review document will have the following format:

    1 {

    2 ” i d ” : s t r i n g

    3 ” R ev iew s ” : [ {

    4 ” R a t i n g s ” : {

    5 ” S e r v i c e ” ( o p t i o n a l ) : numeric ,

    6 ” C l e a n l i n e s s ” ( o p t i o n a l ) : numeric ,

    7 ” O v e r a l l ” : numeric ,

    8 ” Value ” ( o p t i o n a l ) : numeric ,

    9 ” S l e e p Q u a l i t y ” ( o p t i o n a l ) : numeric ,

    10 ”Rooms” ( o p t i o n a l ) : numeric ,

    11 ” L o c a t i o n ” ( o p t i o n a l ) : n u m e r i c

    12 } ,

    13 ” A u t h o r L o c a t i o n ” : s t r i n g ,

    14 ” T i t l e ” : s t r i n g ,

    15 ” Author ” : s t r i n g ,

    16 ” ReviewID ” : s t r i n g ,

    17 ” Content ” : s t r i n g ,

    18 ” Date ” : ISODate ( )

    19 } ] ,

    20 ” H o t e l I n f o ” : {

    21 ”Name” : s t r i n g ,

    22 ” HotelURL ” : s t r i n g ,

    23 ” P r i c e ” : s t r i n g ,

    24 ” A d d r e s s ” : s t r i n g ,

    25 ” H o t e l I D ” : s t r i n g ,

    26 ”ImgURL” : s t r i n g

    27 }

    28 }

  4. Review patterns

    After you have successfully downloaded and stored reviews in your database you are ready to write some useful queries in order to find interesting patterns in your reviews. You can provide graphs for your patterns if you want. You should have these queries (and more):

    1. Find all reviews for a hotel ‘Desert Rose Resort’.
    2. Number of ratings for each hotel. Sort the results.
    3. Average overall ratings for each hotel. Sort the results
    4. Show hotels with number of 5.0 overall ratings that they recieved.
    5. Number of ratings given out per month/day of week.
    6. Number of reviews per author.

      You might notice that there are multiple hotels with the name ‘Desert Rose Resort’. In that case use the HotelID of first ‘Desert Rose Resort’. Alternatively, you might also combine the 2 hotels into one – MongoDB provides an operator to do this.

      You can think of many more queries like that.

      What we expect from you is to submit a number of MongoDB queries you have written in order to find interesting patterns in your data. You can just submit your queries you have written and the result you have gotten.

  5. The same in MySQL

MANDATORY: Create MySQL tables (multiple if required) to store this type of data. The schema should be able to handle all the data. Insert two or three rows of data for the tables you created. Rewrite the queries you ran in MongoDB for MySQL.

OPTIONAL: If you want to try, you can export your data as a .csv file from MongoDB and insert it in MySQL. The command for doing that is:

sudo mongoexport –db cs336 –collection reviews –type=csv \

–fieldFile fields.txt –out ~/Desktop/reviews/reviews.csv

You might have to create new collections from the reviews collection to match your schema of MySQL.

You may have to change the db name, collection name, field filename, and output path to match your setup. The only thing you need to write is the fields.txt file in which you specify the field or fields to include in the export. The file must have only one field per line, and the line(s) must end with the LF character. For more information see the MongoDB manual:

https://docs.mongodb.com/manual/reference/program/mongoexport/

After you import your data in MySQL you can write the same queries in SQL and note differences in the performance, easy of use, or whatever you have found that is interesting comparing.