neoncube we're all grateful for you taking the time with this project! However, I'd suggest that it is more important to start providing features to the site (like audio, multiple choice, classes/groups) as it's not so functional in its current state. A timeline of your intentions would be most welcome ๐Ÿ™‚

    Thanks, ArmyMFL ๐Ÿ™‚

    I don't have a timeline for features, sorry.

    At the moment, I'm splitting my time between cloning courses and adding new features, which I think is a good approach ๐Ÿ™‚

    I've finished uploading the audio files to the cloud and am getting ready to start uploading audio for courses! ๐Ÿ™‚

    It's amazing how much variety there is in the community courses! :D

    While migrating courses, I randomly noticed that there are two courses for German speakers to learn Urdu, and both of them even have full audio! ๐Ÿ˜ฎ

      neoncube Amazing things happen when you harness people's willingness to creation. This is how Wikipedia got so big! That's why it's shameful that Memrise doesn't make good use of itโ€ฆ

        I've uploaded most courses in the following categories:

        • American Sign Language
        • Korean Language Language
        • Galician
        • Haitian
        • Indo European Languages
        • Architecture
        • English for English speakers

        I've also made some improvements to the cloning script and am rerunning it for all categories, to have it upload more courses that previously would have required manual intervention ๐Ÿ™‚

        Sorry I'm a bit late to ask if you can migrate my course. I was still creating it when the forums went down and it was unlisted at the time so I don't think it was part of your migration. It's public on Memrise now and I'd love to move it here. Its https://app.memrise.com/community/course/6491606/duolingo-irish-v2-2023/ which is a successor to another course you already migrated (https://mylittlewordland.com/course/59908). My username on Memrise is juniakaiser.

          5 days later

          Hi @[deleted], I just wondered if you have downloaded all the courses I have created and those I support?

          Please let me know when you get a chance to upload them and I'll go searching for them.

          At present many are still missing.

          This ยป one ยซ should be under Literature.

          And these ยป ones ยซ should be under "Religion - Christianity".


          Once assigned, will we be able to re-assign category to the list you have produced?


          I'd also appreciate a way to search for courses by key words as that could be far quicker than going through a long list of them.

          On MemRise one could do a search having selected a category - eg Geography then key word Earthquake.

            Hi DW7 ๐Ÿ™‚ I think all of those things are still a work in progress ๐Ÿ™‚ Yes, you can change the category via the Edit -> Course info -> "Learning language" dropdown.

            • DW7 likes this.

            Just got back from my Christmas/New Year/birthday vacation! ๐Ÿ™‚

            I've begun running a script that's downloading unlisted courses. This should download courses that:

            • Were created by Memrise and later unlisted
            • Were beyond the 666 per-category page limit
            • Were recently switched from private to public

            I expect this script to take about 1 week to fully run, since it has to check a lot of courses (about 2.5 million). Currently, it looks like about 1/5 of all courses were unlisted! o_o

            I ran the script overnight, and it checked about 250,000 courses, so it may take a couple of weeks to download everything ๐Ÿ™‚

            Interestingly, the number of courses that are being discovered and downloaded rose from 20% to 70%! o_o

            • DW7 likes this.

            NewLandRise I don't think there's an officially published number, but according to this post, it looks like the course numbers go up to 6,500,000 or so: https://forum.mylittlewordland.com/d/57-unlisted-official-arabic-courses/5

            Some of the courses have been deleted or are private, but I'd expect at least 3-4 million courses or so to be public.

            I continued running the script last night, and it looks like it's able to pretty consistently scan 300k courses per night. For 6,500,000 courses, that'd take a little over 20 nights, and that's just for downloading the basic course information (name, category, author, description, etc.). After that, I'd need to download each level's vocabulary and audio/images, so we'd probably be looking at about 40 evenings of script running, plus processing time of perhaps a week. That's pretty long, so I probably need to start also running the script during the day or increasing the number of courses that are downloaded in parallel. Currently, I'm downloading 5 in parallel ๐Ÿ™‚

              I have more questions if you don't mind:

              1.) How many courses you were able to discover (and download) from the public courses categories?

              2.) I assume that you are now accessing the courses through the course ID (from 1 to 6537036). Are you able to retrieve the status of the course that the course is set to? By status I mean that when the course is created it can be set in 3 different states: Incomplete, Unlisted or Public. Did you find a way how to retrieve this information?
              I guess there must be tons of courses that are half finished and still in the Incomplete/Unlisted status. And Memrise doesn't have implemented any access control to Incomplete/Unlisted courses. So anyone can access any course. How will you filter out such courses?

                neoncube The Eltaurus' script became too heavy to be of practical use when the number increased in batch mode, so I modified it in my own way.

                https://github.com/7shi/CourseDump2022

                As far as I tried, I could download 500,000 files of 3,000 courses at once in batch mode. (It seems that if I increase the number of courses any more, the V8 Engine crashes in the process.)

                First of all, I made a pull request for the part about controlling the number of simultaneous connections, but it has not been merged yet.

                https://github.com/Eltaurus-Lt/CourseDump2022/pull/38

                Note: I'm afraid of getting banned for overloading, so I am refraining from using this script at this time.

                  7shi Hm, interestingโ€ฆ

                  If I try to download more than about 5 courses at a time (e.g. just downloading the HTML of https://app.memrise.com/community/course/<id>/course-name/), I start to get this response:

                  <html>
                  <head><title>502 Bad Gateway</title></head>
                  <body>
                  <center><h1>502 Bad Gateway</h1></center>
                  <hr><center>nginx</center>
                  </body>
                  </html>

                  I'd assumed I was overloading the Memrise servers and backed back down to 5.

                  It looks like the Memrise audio and image files are stored on Amazon S3, though, and I can download about 50 of those files in parallel before getting 502 errors ๐Ÿ™‚

                  I wonder if I'm being rate limited because of the large number of courses and S3 files that I've downloaded over the past few weeks.

                  Either way, I'm happy just downloading 5 courses in parallel for a week or so. Hopefully that'll be lighter on the Memrise servers, too ๐Ÿ™‚

                    neoncube 502 occurs rather frequently. If I wait a little and try again, it almost always succeeds. I have included a countermeasure code for this.

                    async function fetchRetry(url, options, retries = 3, interval = 1000) {
                    let ret;
                    for (let i = 0; i < retries; i++) {
                    if (i) console.log("retry", i);
                    await sleep(i ? interval : 200);
                    try {
                    ret = await fetch(url, options);
                    if (ret.status != 502) break;
                    } catch (e) {
                    if (i == retries - 1) throw e;
                    }
                    }
                    return ret;
                    }

                      NewLandRise I was able to discover around 220,000 courses from the public courses categories ๐Ÿ™‚

                      I'd totally forgotten that Memrise lets one set a course to "Incomplete" or "Unlisted"! Indeed, I'm just scanning the courses from 1 to 7,000,000'ish, and this might explain why I'm hitting so many courses that weren't listed via the public course categories! ๐Ÿ™‚

                      I just took a quick look at an unlisted course, and it doesn't look like it has anything to indicate that it's unlisted. I can think of a couple of ways that we might be able to determine whether a course was incomplete/unlisted or not present on the course categories pages because it wasn't popular enough to be in the first 666 pages of a category:

                      1. Check to see if the course belongs to a category that at least 666 pages. If not, then the course should be incomplete/unlisted.
                      2. Download each user's profile and compile a list of all courses taught by all users. If a course is on this list, then it must be public. I'm not sure if this would work for courses where the author has already deactivated their account, though.